kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Herbert Xu	046d20b739	[TCP]: Ratelimit debugging warning. Better safe than sorry. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-13 14:42:24 -07:00
David S. Miller	c8923c6b85	[NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering. Original patch by Harald Welte, with feedback from Herbert Xu and testing by S�bastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-13 14:41:23 -07:00
Herbert Xu	9ff5c59ce2	[TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!" This is the second report of this bug. Unfortunately the first reporter hasn't been able to reproduce it since to provide more debugging info. So let's apply this patch for 2.6.14 to 1) Make this non-fatal. 2) Provide the info we need to track it down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-12 15:59:39 -07:00
Arnaldo Carvalho de Melo	eeb2b85606	[TWSK]: Grab the module refcount for timewait sockets This is required to avoid unloading a module that has active timewait sockets, such as DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:25:23 -07:00
Pablo Neira Ayuso	061cb4a0ec	[NETFILTER] ctnetlink: add support to change protocol info This patch add support to change the state of the private protocol information via conntrack_netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:23:46 -07:00
Pablo Neira Ayuso	3392315375	[NETFILTER] ctnetlink: allow userspace to change TCP state This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:23:28 -07:00
Harald Welte	a051a8f730	[NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:21:10 -07:00
Herbert Xu	d4875b049b	[IPSEC] Fix block size/MTU bugs in ESP This patch fixes the following bugs in ESP: * Fix transport mode MTU overestimate. This means that the inner MTU is smaller than it needs be. Worse yet, given an input MTU which is a multiple of 4 it will always produce an estimate which is not a multiple of 4. For example, given a standard ESP/3DES/MD5 transform and an MTU of 1500, the resulting MTU for transport mode is 1462 when it should be 1464. The reason for this is because IP header lengths are always a multiple of 4 for IPv4 and 8 for IPv6. * Ensure that the block size is at least 4. This is required by RFC2406 and corresponds to what the esp_output function does. At the moment this only affects crypto_null as its block size is 1. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:11:34 -07:00
Herbert Xu	a02a64223e	[IPSEC]: Use ALIGN macro in ESP This patch uses the macro ALIGN in all the applicable spots for ESP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:11:08 -07:00
Pablo Neira Ayuso	e1c73b78e3	[NETFILTER] ctnetlink: add one nesting level for TCP state To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:55:49 -07:00
Pablo Neira Ayuso	a1bcc3f268	[NETFILTER] ctnetlink: ICMP ID is not mandatory The ID is only required by ICMP type 8 (echo), so it's not mandatory for all sort of ICMP connections. This patch makes mandatory only the type and the code for ICMP netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:53:16 -07:00
Harald Welte	d000eaf772	[NETFILTER] conntrack_netlink: Fix endian issue with status from userspace When we send "status" from userspace, we forget to convert the endianness. This patch adds the reqired conversion. Thanks to Pablo Neira for discovering this. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:52:51 -07:00
Harald Welte	f40863cec8	[NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETE Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete. This should have been part of the original nfnetlink_log merge, but I somehow missed it. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:51:53 -07:00
Harald Welte	85d9b05d9b	[NETFILTER] PPTP helper: Add missing Kconfig dependency PPTP should not be selectable without conntrack enabled Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:47:42 -07:00
Al Viro	dd0fc66fb3	[PATCH] gfp flags annotations - part 1 - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-08 15:00:57 -07:00
Stephen Hemminger	42a39450f8	[TCP]: BIC coding bug in Linux 2.6.13 Missing parenthesis in causes BIC to be slow in increasing congestion window. Spotted by Injong Rhee. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-05 12:09:31 -07:00
Randy Dunlap	8eea00a44d	[IPVS]: fix sparse gfp nocast warnings From: Randy Dunlap <rdunlap@xenotime.net> Fix implicit nocast warnings in ip_vs code: net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 22:42:15 -07:00
Horst H. von Brand	a5181ab06d	[NETFILTER]: Fix Kconfig typo Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 15:58:56 -07:00
Robert Olsson	e6308be85a	[IPV4]: fib_trie root-node expansion The patch below introduces special thresholds to keep root node in the trie large. This gives a flatter tree at the cost of a modest memory increase. Overall it seems to be gain and this was also proposed by one the authors of the paper in recent a seminar. Main table after loading 123 k routes. Aver depth: 3.30 Max depth: 9 Root-node size 12 bits Total size: 4044 kB With the patch: Aver depth: 2.78 Max depth: 8 Root-node size 15 bits Total size: 4150 kB An increase of 8-10% was seen in forwading performance for an rDoS attack. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 13:01:58 -07:00
David S. Miller	7ce312467e	[IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 16:07:30 -07:00
Herbert Xu	e5ed639913	[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:35:55 -07:00
Herbert Xu	444fc8fc3a	[IPV4]: Fix "Proxy ARP seems broken" Meelis Roos <mroos@linux.ee> wrote: > RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3, > RK> it appears to be completely broken. The firewall is 212.18.232.186, > > Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to > ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging > yet since it'a a production gateway. The breakage is caused by the change to use the CB area for flagging whether a packet has been queued due to proxy_delay. This area gets cleared every time arp_rcv gets called. Unfortunately packets delayed due to proxy_delay also go through arp_rcv when they are reprocessed. In fact, I can't think of a reason why delayed proxy packets should go through netfilter again at all. So the easiest solution is to bypass that and go straight to arp_process. This is essentially what would've happened before netfilter support was added to ARP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:18:10 -07:00
Eric Dumazet	81c3d5470e	[INET]: speedup inet (tcp/dccp) lookups Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a lot of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! / } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((((__u64 )&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((((__u32 )&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read\|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:13:38 -07:00
Herbert Xu	325ed82393	[NET]: Fix packet timestamping. I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 13:57:23 -07:00
Alexey Kuznetsov	09e9ec8711	[TCP]: Don't over-clamp window in tcp_clamp_window() From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-29 17:17:15 -07:00

1 2 3 4 5 ...

345 Commits