When we are doing ucopy, we try to defer the ACK generation to
cleanup_rbuf(). This works most of the time very well, but if the
ucopy prequeue is large, this ACKing behavior kills performance.
With TSO, it is possible to fill the prequeue so large that by the
time the ACK is sent and gets back to the sender, most of the window
has emptied of data and performance suffers significantly.
This behavior does help in some cases, so we should think about
re-enabling this trick in the future, using some kind of limit in
order to avoid the bug case.
Signed-off-by: David S. Miller <davem@davemloft.net>
In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.
However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug causes:
assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)
What's happening is that:
1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
Note that the rmalloc is not returned at this point since the
skb is still referenced.
3) The same skb is now sent to socket 2.
This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to verify that the payload contains enough data so that
attach_one_algo can copy alg_key_len bits from the payload.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable alg_key_len is in bits and not bytes. The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove extra __ip_vs_conn_put for incoming ICMP in direct routing
mode. Mark de Vries reports that IPVS connections are not leaked anymore.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.
I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's. The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter. Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.
So let's go the other way and make this an invariant:
For any skb on a frag_list, skb->sk must be NULL.
That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.
The above invariant is actually violated in the following patch
for a short duration inside ip_fragment. This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like skb_cow_data() does not set
proper owner for newly created skb.
If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there
might be a situation when we require recreating skb and
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.
If above "analysis" is correct then attached patch fixes that.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross moved. Remove the bad email address so people will find the correct
one in ./CREDITS.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
this matches the API used by other link layer like ethernet or token
ring.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This causes sk->sk_prot to change, which makes the socket
release free the sock into the wrong SLAB cache. Fix this
by introducing sk_prot_creator so that we always remember
where the sock came from.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* net/irda/irda_device.c::irda_setup_dma() made conditional on
ISA_DMA_API (it uses helpers in question and irda is usable on
platforms that don't have them at all - think of USB IRDA, for
example).
* irda drivers that depend on ISA DMA marked as dependent on
ISA_DMA_API
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Long standing bug.
Policy to repeat an action never worked.
Signed-off-by: J Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found a bug that stopped IPsec/IPv6 from working. About
a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
entries. If the cached socket dst entry is IPsec, then rt6i_idev will
be NULL.
Since we want to look at the rt6i_idev of the original route in this
case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
as we do for a number of other IPv6 route attributes. Unfortunately
this means that we need some new code to handle the references to
rt6i_idev. That's why this patch is bigger than it would otherwise be.
I've also done the same thing for IPv4 since it is conceivable that
once these idev attributes start getting used for accounting, we
probably need to dereference them for IPv4 IPsec entries too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix qlen underrun when doing duplication with netem. If netem is used
as leaf discipline, then the parent needs to be tweaked when packets
are duplicated.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netem currently dumps packets into the queue when timer expires. This
patch makes work by self-clocking (more like TBF). It fixes a bug
when 0 delay is requested (only doing loss or duplication).
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to bugs in netem (fixed by later patches), it is possible to get qdisc
qlen to go negative. If this happens the CPU ends up spinning forever
in qdisc_run(). So add a BUG_ON() to trap it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>