GRO assumes that there is a one-to-one relationship between NAPI
structure and network device. Some devices like sky2 share multiple
devices on a single interrupt so only have one NAPI handler. Rather than
split GRO from NAPI, just have GRO assume if device changes that
it is a different flow.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I fixed the GRO crash in the legacy receive path I used
napi_complete to replace __napi_complete. Unfortunately they're
not the same when NETPOLL is enabled, which may result in us
not calling __napi_complete at all.
What's more, we really do need to keep the __napi_complete call
within the IRQ-off section since in theory an IRQ can occur in
between and fill up the backlog to the maximum, causing us to
lock up.
Since we can't seem to find a fix that works properly right now,
this patch reverts all the GRO support from the netif_rx path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no gain using prefetch() in dev_hard_start_xmit(), since
we already had to read ops->ndo_select_queue pointer in dev_pick_tx(),
and both pointers are probably located in the same cache line.
This prefetch call slows down fast path because of a stall in address
computation.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some minor changes to queue hashing:
1. Use const on accessor functions
2. Export skb_tx_hash for use in drivers (see ixgbe)
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sk_buff pointers should be freed with kfree_skb.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs. This simply doesn't work since we need to flush the held
GRO packets first.
This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place. As it
stands we may pass freed skbs to netpoll_receive_skb.
This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll. This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry. I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.
This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out). This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.
Add a netdev_resync_ops() and call it from the vlan code.
Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.
With help from Patrick McHardy.
Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.
Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb. So remove net_alive allowing
packet reception run a little faster.
Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netpoll entry checks are required to ensure that we don't
receive normal packets when invoked via netpoll. Unfortunately
it only ever worked for the netif_receive_skb/netif_rx entry
points. The VLAN (and subsequently GRO) entry point didn't
have the check and therefore can trigger all sorts of weird
problems.
This patch adds the netpoll check to all entry points.
I'm still uneasy with receiving at all under netpoll (which
apparently is only used by the out-of-tree kdump code). The
reason is it is perfectly legal to receive all data including
headers into highmem if netpoll is off, but if you try to do
that with netpoll on and someone gets a printk in an IRQ handler
you're going to get a nice BUG_ON.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.
Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb. So remove net_alive allowing
packet reception run a little faster.
Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of the TX software time stamping fallback is
faulty because it accesses the skb after ndo_start_xmit() returns
successfully. This patch removes the fallback, which fixes kernel panics
seen during stress tests. Hardware time stamping is not affected by this
removal.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize skb_tx_hash() by eliminating a comparison that executes for
every packet. skb_tx_hashrnd initialization is moved to a later part of
the startup sequence, namely after the "random" driver is initialized.
Rebooted the system three times and verified that the code generates
different random numbers each time.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The additional per-packet information (16 bytes for time stamps, 1
byte for flags) is stored for all packets in the skb_shared_info
struct. This implementation detail is hidden from users of that
information via skb_* accessor functions. A separate struct resp.
union is used for the additional information so that it can be
stored/copied easily outside of skb_shared_info.
Compared to previous implementations (reusing the tstamp field
depending on the context, optional additional structures) this
is the simplest solution. It does not extend sk_buff itself.
TX time stamping is implemented in software if the device driver
doesn't support hardware time stamping.
The new semantic for hardware/software time stamping around
ndo_start_xmit() is based on two assumptions about existing
network device drivers which don't support hardware time
stamping and know nothing about it:
- they leave the new skb_shared_tx unmodified
- the keep the connection to the originating socket in skb->sk
alive, i.e., don't call skb_orphan()
Given that skb_shared_tx is new, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimises the Ethernet header comparison to use 2-byte
and 4-byte xors instead of memcmp. In order to facilitate this,
the actual comparison is now carried out by the callers of the
shared dev_gro_receive function.
This has a significant impact when receiving 1500B packets through
10GbE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prepares for the move of the same_flow checks out of
dev_gro_receive. As such we need to remember the number of held
packets since doing a loop just to count them every time is silly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>