Commit Graph

1294 Commits

Author SHA1 Message Date
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Eric Dumazet
9d3e44425e [SOCK]: Adds a rcu_dereference() in sk_filter
It seems commit fda9ef5d67 introduced a RCU 
protection for sk_filter(), without a rcu_dereference()

Either we need a rcu_dereference(), either a comment should explain why we 
dont need it. I vote for the former.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:41:28 -08:00
Eric Dumazet
0f99be0d11 [XFRM]: xfrm_algo_clone() allocates too much memory
alg_key_len is the length in bits of the key, not in bytes.

Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c 
to include/net/xfrm.h, and to use it in xfrm_algo_clone()

alg_len() is renamed to xfrm_alg_len() because of its global exposition.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:39:06 -08:00
Paul Moore
02f1c89d6e [NET]: Clone the sk_buff 'iif' field in __skb_clone()
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:17 -08:00
Vlad Yasevich
f691724c4d [SCTP]: Fix the name of the authentication event.
The even should be called SCTP_AUTHENTICATION_INDICATION.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:02 -08:00
Stephen Hemminger
ecef969e5b [VETH]: move veth.h to include/linux
Move veth.h from net/ to linux/ since it is a user api, and add it to
user header processing Kbuild.

[ Use header-y as suggested by Sam Ravnborg.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-26 19:36:35 -08:00
Patrick McHardy
fae718ddaf [NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility
Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.

Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-26 19:36:33 -08:00
Joe Perches
f4ab2f72e9 [NET] include/net/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-20 13:56:32 -08:00
Vlad Yasevich
8e71a11c9f [SCTP]: Fix the bind_addr info during migration.
During accept/migrate the code attempts to copy the addresses from
the parent endpoint to the new endpoint.   However, if the parent
was bound to a wildcard address, then we end up pointlessly copying
all of the current addresses on the system.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07 01:07:49 -08:00
Denis V. Lunev
56c99d0415 [IPV4]: Remove prototype of ip_rt_advice
ip_rt_advice has been gone, so no need to keep prototype and debug message.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07 01:07:38 -08:00
Vlad Yasevich
b7e0fe9f81 SCTP: Fix build issues with SCTP AUTH.
SCTP-AUTH requires selection of CRYPTO, HMAC and SHA1 since
SHA1 is a MUST requirement for AUTH.  We also support SHA256,
but that's optional, so fix the code to treat it as such.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-29 10:17:42 -05:00
Pavel Emelyanov
218ad12f42 [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on
The inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		x = vmalloc(...);
	else
#endif
		x = kmalloc(...);

Unlike it, the inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		vfree(x);
	else
#else
		kfree(x);
#endif

The error is obvious - if the NUMA is on and the size
is less than the PAGE_SIZE we leak the pointer (kfree is
inside the #else branch).

Compiler doesn't warn us because after the kfree(x) there's
a "x = NULL" assignment, so here's another (minor?) bug: we 
don't set x to NULL under certain circumstances.

Boring explanation, I know... Patch explains it better.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-26 20:23:31 +08:00
David S. Miller
53438e5d04 Merge branch 'fixes-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2007-11-20 17:24:29 -08:00
Guillaume Chazarain
92468c53cf ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution
if (net_ratelimit())
	IEEE80211_DEBUG_DROP(...)

can pollute the logs with messages like:

printk: 1 messages suppressed.
printk: 2 messages suppressed.
printk: 7 messages suppressed.

if debugging information is disabled. These messages are printed by
net_ratelimit(). Add a wrapper to net_ratelimit() that takes into account
the log level, so that net_ratelimit() is called only when we really want
to print something.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-11-20 16:43:17 -05:00
Ilpo Järvinen
6e42141009 [TCP] MTUprobe: fix potential sk_send_head corruption
When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 23:24:09 -08:00
Simon Horman
9055fa1f3d [IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED
Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:51:13 -08:00
Simon Horman
9e103fa6bd [IPVS]: Fix sysctl warnings about missing strategy in schedulers
sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.

This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:50:21 -08:00
Christian Borntraeger
611cd55b15 [IPVS]: Fix sysctl warnings about missing strategy
Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]		  
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a 
proper strategy handler, but syscall sysctl is deprecated.

There are other sysctl definitions that are commented out or work with 
the default sysctl_data strategy. I did not touch these. 

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 21:49:25 -08:00
Herbert Xu
21df56c6e2 [TCP]: Fix TCP header misalignment
Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page).  So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)

This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned.  This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.

I thought about putting this in the callers but all the current
callers are from TCP.  If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-18 18:48:08 -08:00
Pavel Emelyanov
dab6ba3688 [INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue
The request_sock_queue's listen_opt is either vmalloc-ed or
kmalloc-ed depending on the number of table entries. Thus it 
is expected to be handled properly on free, which is done in 
the reqsk_queue_destroy().

However the error path in inet_csk_listen_start() calls 
the lite version of reqsk_queue_destroy, called 
__reqsk_queue_destroy, which calls the kfree unconditionally. 

Fix this and move the __reqsk_queue_destroy into a .c file as 
it looks too big to be inline.

As David also noticed, this is an error recovery path only,
so no locking is required and the lopt is known to be not NULL.

reqsk_queue_yank_listen_sk is also now only used in
net/core/request_sock.c so we should move it there too.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-15 02:57:06 -08:00
Herbert Xu
fb93134dfc [TCP]: Fix size calculation in sk_stream_alloc_pskb
We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room.  Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.

As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.

In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room.  TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.

So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-14 15:45:21 -08:00
Denis V. Lunev
022cbae611 [NET]: Move unneeded data to initdata section.
This patch reverts Eric's commit 2b008b0a8e

It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.

Signed-of-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-13 03:23:50 -08:00
Pavel Emelyanov
d71209ded2 [INET]: Use list_head-s in inetpeer.c
The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 21:27:28 -08:00
Arnaldo Carvalho de Melo
c0d8248710 [INET]: Remove leftover prototypes from include/net/inet_common.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 21:02:51 -08:00
David S. Miller
bce943278d Merge branch 'pending' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2007-11-12 18:16:13 -08:00