Commit Graph

72 Commits

Author SHA1 Message Date
John Heffner 52bf376c63 [TCP]: Fix up sysctl_tcp_mem initialization.
Fix up tcp_mem initial settings to take into account the size of the
hash entries (different on SMP and non-SMP systems).

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15 21:18:51 -08:00
John Heffner 9e950efa20 [TCP]: Don't use highmem in tcp hash size calculation.
This patch removes consideration of high memory when determining TCP
hash table sizes.  Taking into account high memory results in tcp_mem
values that are too large.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-07 15:10:11 -08:00
Alexey Kuznetsov 1ef9696c90 [TCP]: Send ACKs each 2nd received segment.
It does not affect either mss-sized connections (obviously) or
connections controlled by Nagle (because there is only one small
segment in flight).

The idea is to record the fact that a small segment arrives on a
connection, where one small segment has already been received and
still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains
receive buffer.

In other words, it is a "soft" each-2nd-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:05 -07:00
Alexey Dobriyan e5d679f339 [NET]: Use SLAB_PANIC
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:19 -07:00
Brian Haley ab32ea5d8a [NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly
Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly.

Couldn't actually measure any performance increase while testing (.3%
I consider noise), but seems like the right thing to do.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:55:03 -07:00
Patrick McHardy 84fa7933a3 [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).

Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:53 -07:00
Alexey Dobriyan 29bbd72d6e [NET]: Fix more per-cpu typos
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 15:02:31 -07:00
David S. Miller 52499afe40 [TCP]: Process linger2 timeout consistently.
Based upon guidance from Alexey Kuznetsov.

When linger2 is active, we check to see if the fin_wait2
timeout is longer than the timewait.  If it is, we schedule
the keepalive timer for the difference between the timewait
timeout and the fin_wait2 timeout.

When this orphan socket is seen by tcp_keepalive_timer()
it will try to transform this fin_wait2 socket into a
fin_wait2 mini-socket, again if linger2 is active.

Not all paths were setting this initial keepalive timer correctly.
The tcp input path was doing it correctly, but tcp_close() wasn't,
potentially making the socket linger longer than it really needs to.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:24 -07:00
Herbert Xu bbcf467dab [NET]: Verify gso_type too in gso_segment
We don't want nasty Xen guests to pass a TCPv6 packet in with gso_type set
to TCPv4 or even UDP (or a packet that's both TCP and UDP).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-03 19:38:35 -07:00
Linus Torvalds e37a72de84 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [IPV6]: Added GSO support for TCPv6
  [NET]: Generalise TSO-specific bits from skb_setup_caps
  [IPV6]: Added GSO support for TCPv6
  [IPV6]: Remove redundant length check on input
  [NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks
  [TG3]: Update version and reldate
  [TG3]: Add TSO workaround using GSO
  [TG3]: Turn on hw fix for ASF problems
  [TG3]: Add rx BD workaround
  [TG3]: Add tg3_netif_stop() in vlan functions
  [TCP]: Reset gso_segs if packet is dodgy
2006-06-30 15:40:17 -07:00
Herbert Xu bcd7611117 [NET]: Generalise TSO-specific bits from skb_setup_caps
This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock.  This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.

The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:12:08 -07:00
Herbert Xu adcfc7d0b4 [IPV6]: Added GSO support for TCPv6
This patch adds GSO support for IPv6 and TCPv6.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:12:06 -07:00
Herbert Xu 3820c3f3e4 [TCP]: Reset gso_segs if packet is dodgy
I wasn't paranoid enough in verifying GSO information.  A bogus gso_segs
could upset drivers as much as a bogus header would.  Let's reset it in
the per-protocol gso_segment functions.

I didn't verify gso_size because that can be verified by the source of
the dodgy packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30 14:11:47 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Herbert Xu 576a30eb64 [NET]: Added GSO header verification
When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:53 -07:00
Herbert Xu 0718bcc09b [NET]: Fix CHECKSUM_HW GSO problems.
Fix checksum problems in the GSO code path for CHECKSUM_HW packets.

The ipv4 TCP pseudo header checksum has to be adjusted for GSO
segmented packets.

The adjustment is needed because the length field in the pseudo-header
changes.  However, because we have the inequality oldlen > newlen, we
know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
This also means that htonl(delta) + th->check still fits in 32 bits.
Therefore we don't have to use csum_add on this operations.

This is based on a patch by Michael Chan <mchan@broadcom.com>.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:55:46 -07:00
Herbert Xu f4c50d990d [NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:33 -07:00
Herbert Xu 7967168cef [NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:29 -07:00
Herbert Xu 8648b3053b [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM
The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places.  For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two.  We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:06:05 -07:00
Chris Leech 1a2449a87b [I/OAT]: TCP recv offload to I/OAT
Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:56 -07:00
Chris Leech 624d116473 [I/OAT]: Make sk_eat_skb I/OAT aware.
Add an extra argument to sk_eat_skb, and make it move early copied
packets to the async_wait_queue instead of freeing them.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:52 -07:00
Chris Leech 0e4b4992b8 [I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static
Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:25:50 -07:00
Herbert Xu 75c2d9077c [TCP]: Fix sock_orphan dead lock
Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead
locks.  For example, the inet_diag code holds sk_callback_lock without
disabling BH.  If an inbound packet arrives during that admittedly tiny
window, it will cause a dead lock on bh_lock_sock.  Another possible
path would be through sock_wfree if the network device driver frees the
tx skb in process context with BH enabled.

We can fix this by moving sock_orphan out of bh_lock_sock.

The tricky bit is to work out when we need to destroy the socket
ourselves and when it has already been destroyed by someone else.

By moving sock_orphan before the release_sock we can solve this
problem.  This is because as long as we own the socket lock its
state cannot change.

So we simply record the socket state before the release_sock
and then check the state again after we regain the socket lock.
If the socket state has transitioned to TCP_CLOSE in the time being,
we know that the socket has been destroyed.  Otherwise the socket is
still ours to keep.

Note that I've also moved the increment on the orphan count forward.
This may look like a problem as we're increasing it even if the socket
is just about to be destroyed where it'll be decreased again.  However,
this simply enlarges a window that already exists.  This also changes
the orphan count test by one.

Considering what the orphan count is meant to do this is no big deal.

This problem was discoverd by Ingo Molnar using his lock validator.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-05-03 23:31:35 -07:00
Linus Torvalds b55813a2e5 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETFILTER] x_table.c: sem2mutex
  [IPV4]: Aggregate route entries with different TOS values
  [TCP]: Mark tcp_*mem[] __read_mostly.
  [TCP]: Set default max buffers from memory pool size
  [SCTP]: Fix up sctp_rcv return value
  [NET]: Take RTNL when unregistering notifier
  [WIRELESS]: Fix config dependencies.
  [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.
  [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.
  [MODULES]: Don't allow statically declared exports
  [BRIDGE]: Unaligned accesses in the ethernet bridge
2006-03-25 08:39:20 -08:00
Davide Libenzi f348d70a32 [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications
Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00