This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.
It also updates the ibm_newemac driver to use that, this fixing the
oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()
Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an skb with page frags is merged into an existing one, we
cannibalise its reference count. This is OK when the skb is
reused because we set nr_frags to zero in that case. However,
for the case where the skb is freed through kfree_skb, we didn't
clear nr_frags which causes the page to be freed prematurely.
This is fixed by moving the skb resetting into skb_gro_receive.
Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As GRO cannot be applied to packets with frag_list we need to
make sure that we reject such packets if they are fed to us,
e.g., through a tunnel device.
Also there is no point in applying GRO on GSO packets so they
too should be rejected. This allows GRO to be used in virtio-net
which may produce GSO packets directly but may still benefit
from GRO if the other end of it doesn't support GSO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent dmaengine rework removed the capability to remove dma device
driver modules while net_dma is active. Rather than notify
dmaengine-clients that channels are trying to be removed, we now rely on
clients to notify dmaengine when they no longer have a need for
channels. Teach net_dma to release channels by taking dmaengine
references at netdevice open and dropping references at netdevice close.
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
ioat: fix self test for multi-channel case
dmaengine: bump initcall level to arch_initcall
dmaengine: advertise all channels on a device to dma_filter_fn
dmaengine: use idr for registering dma device numbers
dmaengine: add a release for dma class devices and dependent infrastructure
ioat: do not perform removal actions at shutdown
iop-adma: enable module removal
iop-adma: kill debug BUG_ON
iop-adma: let devm do its job, don't duplicate free
dmaengine: kill enum dma_state_client
dmaengine: remove 'bigref' infrastructure
dmaengine: kill struct dma_client and supporting infrastructure
dmaengine: replace dma_async_client_register with dmaengine_get
atmel-mci: convert to dma_request_channel and down-level dma_slave
dmatest: convert to dma_request_channel
dmaengine: introduce dma_request_channel and private channels
net_dma: convert to dma_find_channel
dmaengine: provide a common 'issue_pending_all' implementation
dmaengine: centralize channel allocation, introduce dma_find_channel
dmaengine: up-level reference counting to the module level
...
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags. These interfaces are for
device drivers.
This patch rearranges things to provide a new set of interfaces
for VLANs. These interfaces are for internal use only. The
VLAN code itself can then provide a set of entry points for
device drivers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users have been converted to either the general-purpose allocator,
dma_find_channel, or dma_request_channel.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that clients no longer need to be notified of channel arrival
dma_async_client_register can simply increment the dmaengine_ref_count.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
async_tx and net_dma each have open-coded versions of issue_pending_all,
so provide a common routine in dmaengine.
The implementation needs to walk the global device list, so implement
rcu to allow dma_issue_pending_all to run lockless. Clients protect
themselves from channel removal events by holding a dmaengine reference.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.
It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb. This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.
The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.
The merging itself is rather simple. We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry. Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.
If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself. The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
During network namespace teardown we either move or delete
all of the network devices associated with a network namespace.
In the case of veth devices deleting one will also delete it's
pair device. If both devices are in the same network namespace
then for_each_netdev_safe is insufficient as next may point
to the second veth device we have deleted.
To avoid problems I do what we do in __rtnl_kill_links and
restart the scan of the device list, after we have deleted
a device.
Currently dev_change_netnamespace does not appear to suffer from
this problem, but wireless devices are also paired and likely
should be moved between network namespaces together. So I have
errored on the side of caution and restart the scan of the network
devices in that case as well.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
The initial skb may have been freed after napi_gro_complete in
napi_gro_receive if it was merged into an existing packet. Thus
we cannot check same_flow (which indicates whether it was merged)
after calling napi_gro_complete.
This patch fixes this by saving the same_flow status before the
call to napi_gro_complete.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent GRO patches introduced the NAPI removal of devices in
free_netdev. For drivers that can change the number of queues during
driver operation, the NAPI infrastructure doesn't allow the freeing and
re-addition of NAPI entities without reloading the driver.
This change reinitializes the dev_list in each NAPI struct on delete,
instead of just deleting it (and assigning the list pointers to POISON).
Drivers that wish to remove/re-add NAPI will need to re-initialize the
netdev napi_list after removing all NAPI instances, before re-adding NAPI
devices again.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A command like this: "brctl addif br1 eth1" issued as a user gave me
an oops when bridge module wasn't loaded. It's caused by using a dev
pointer before checking for NULL.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A separate xmit lock class supports GPRS over a Phonet pipe over a TUN
device (type ARPHRD_NONE).
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
This is pretty similar to LRO except that this is protocol-independent.
Instead of holding packets in an lro_mgr structure, they're now held in
napi_struct.
For drivers that intend to use this, they can set the NETIF_F_GRO bit and
call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
The latter will call napi_receive_skb automatically. When napi_gro_receive
is used, the driver must either call napi_complete/napi_rx_complete, or
call napi_gro_flush in softirq context if the driver uses the primitives
__napi_complete/__napi_rx_complete.
Protocols will set the gro_receive and gro_complete function pointers in
order to participate in this scheme.
In addition to the packet, gro_receive will get a list of currently held
packets. Each packet in the list has a same_flow field which is non-zero
if it is a potential match for the new packet. For each packet that may
match, they also have a flush field which is non-zero if the held packet
must not be merged with the new packet.
Once gro_receive has determined that the new skb matches a held packet,
the held packet may be processed immediately if the new skb cannot be
merged with it. In this case gro_receive should return the pointer to
the existing skb in gro_list. Otherwise the new skb should be merged into
the existing packet and NULL should be returned, unless the new skb makes
it impossible for any further merges to be made (e.g., FIN packet) where
the merged skb should be returned.
Whenever the skb is merged into an existing entry, the gro_receive
function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb
merely matches an existing entry but can't be merged with it, then
this shouldn't be set.
If gro_receive finds it pointless to hold the new skb for future merging,
it should set NAPI_GRO_CB(skb)->flush.
Held packets will be flushed by napi_gro_flush which is called by
napi_complete and napi_rx_complete.
Currently held packets are stored in a singly liked list just like LRO.
The list is limited to a maximum of 8 entries. In future, this may be
expanded to use a hash table to allow more flows to be held for merging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows GSO to handle frag_list in a limited way for the
purposes of allowing packets merged by GRO to be refragmented on
output.
Most hardware won't (and aren't expected to) support handling GRO
frag_list packets directly. Therefore we will perform GSO in
software for those cases.
However, for drivers that can support it (such as virtual NICs) we
may not have to segment the packets at all.
Whether the added overhead of GRO/GSO is worthwhile for bridges
and routers when weighed against the benefit of potentially
increasing the MTU within the host is still an open question.
However, for the case of host nodes this is undoubtedly a win.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last shoot of this series.
After I removing all directly reference of netdev->priv, I am killing
"priv" of "struct net_device" and fixing relative comments/docs.
Anyone will not be allowed to reference netdev->priv directly.
If you want to reference the memory of private data, use netdev_priv()
instead.
If the private data is not allocted when alloc_netdev(), use
netdev->ml_priv to point that memory after you creating that private
data.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.
Note: change return code to make it clear that the caller should not
go changing the returned statistics.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the network device internal API to move adminstrative
operations out of the network device structure and into a separate structure.
This patch involves some hackery to maintain compatablity between the
new and old model, so all 300+ drivers don't have to be changed at once.
For drivers that aren't converted yet, the netdevice_ops virt function list
still resides in the net_device structure. For old protocols, the new
net_device_ops are copied out to the old net_device pointers.
After the transistion is completed the nag message can be changed to
an WARN_ON, and the compatiablity code can be made configurable.
Some function pointers aren't moved:
* destructor can't be in net_device_ops because
it may need to be referenced after the module is unloaded.
* neighbor setup is manipulated in a couple of places that need special
consideration
* hard_start_xmit is in the fast path for transmit.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>