The netpoll system currently has a rx to tx path via:
netpoll_rx
__netpoll_rx
arp_reply
netpoll_send_skb
dev->hard_start_tx
This rx->tx loop places network drivers at risk of inadvertently causing a
deadlock or BUG halt by recursively trying to acquire a spinlock that is
used in both their rx and tx paths (this problem was origionally reported
to me in the 3c59x driver, which shares a spinlock between the
boomerang_interrupt and boomerang_start_xmit routines).
This patch breaks this loop, by queueing arp frames, so that they can be
responded to after all receive operations have been completed. Tested by
myself and the reported with successful results.
Specifically it was tested with netdump. Heres the BZ with details:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=194055
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transmitting a skb in netpoll_send_skb(), only retry a limited number
of times if the device queue is stopped.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_find_text takes a "to" argument which is supposed to limit how
far into the skb it will search for the given text. At present,
it seems to ignore that argument on the first skb, and instead
return a match even if the text occurs beyond the limit.
Patch below fixes this, after adjusting for the "from" starting
point. This consequently fixes the netfilter string match's "--to"
handling, which currently is broken.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 2 problems in dev_hard_start_xmit():
1. nskb->next needs to link back to skb->next if hard_start_xmit()
returns non-zero.
2. Since the total number of GSO fragments may exceed MAX_SKB_FRAGS + 1,
it needs to stop transmitting if the netif_queue is stopped.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_deactivate function has bit-rotted since the introduction of
lockless drivers. In particular, the spin_unlock_wait call at the end
has no effect on the xmit routine of lockless drivers.
With a little bit of work, we can make it much more useful by providing
the guarantee that when it returns, no more calls to the xmit routine
of the underlying driver will be made.
The idea is simple. There are two entry points in to the xmit routine.
The first comes from dev_queue_xmit. That one is easily stopped by
using synchronize_rcu. This works because we set the qdisc to noop_qdisc
before the synchronize_rcu call. That in turn causes all subsequent
packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call
also ensures all outstanding calls leave their critical section.
The other entry point is from qdisc_run. Since we now have a bit that
indicates whether it's running, all we have to do is to wait until the
bit is off.
I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is
useless because netif_wake_queue can cause it to be set again. It is
also harmless because we've disarmed qdisc_run.
I've also removed the spin_unlock_wait on xmit_lock because its only
purpose of making sure that all outstanding xmit_lock holders have
exited is also given by dev_watchdog_down.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
First of all it is unnecessary to allocate a new skb in skb_pad since
the existing one is not shared. More importantly, our hard_start_xmit
interface does not allow a new skb to be allocated since that breaks
requeueing.
This patch uses pskb_expand_head to expand the existing skb and linearize
it if needed. Actually, someone should sift through every instance of
skb_pad on a non-linear skb as they do not fit the reasons why this was
originally created.
Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because
of the position of the write it is unlikely to cause problems but still
it's best if we don't do it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
ETHTOOL_GUFO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places. For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's better to warn and fail rather than rarely triggering BUG on paths
that incorrectly call skb_trim/__skb_trim on a non-linear skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The linearisation operation doesn't need to be super-optimised. So we can
replace __skb_linearize with __pskb_pull_tail which does the same thing but
is more general.
Also, most users of skb_linearize end up testing whether the skb is linear
or not so it helps to make skb_linearize do just that.
Some callers of skb_linearize also use it to copy cloned data, so it's
useful to have a new function skb_linearize_cow to copy the data if it's
either non-linear or cloned.
Last but not least, I've removed the gfp argument since nobody uses it
anymore. If it's ever needed we can easily add it back.
Misc bugs fixed by this patch:
* via-velocity error handling (also, no SG => no frags)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.
With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.
While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.
So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.
I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.
This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.
The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a secmark field to the skbuff structure, to allow security subsystems to
place security markings on network packets. This is similar to the nfmark
field, except is intended for implementing security policy, rather than than
networking policy.
This patch was already acked in principle by Dave Miller.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides for pinning user space pages in memory, copying to iovecs,
and copying from sk_buffs including fragmented and chained sk_buffs.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed that dev_alloc_name() comment was incorrect, and more spellung
errors.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>