Commit Graph

137 Commits

Author SHA1 Message Date
Timo Teras 0010e46577 ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
cache entries where ifindex is specified. This is similar to what is
already done in ip_rt_redirect().

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:32:25 -07:00
Pavel Emelyanov 5e659e4cb0 [NET]: Fix heavy stack usage in seq_file output routines.
Plan C: we can follow the Al Viro's proposal about %n like in this patch.
The same applies to udp, fib (the /proc/net/route file), rt_cache and 
sctp debug. This is minus ~150-200 bytes for each.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-24 01:02:16 -07:00
YOSHIFUJI Hideaki a7d632b6b4 [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:00 -07:00
Stephen Hemminger c0b8c32b1c IPV4: use xor rather than multiple ands for route compare
The comparison in ip_route_input is a hot path, by recoding the C
"and" as bit operations, fewer conditional branches get generated
so the code should be faster. Maybe someday Gcc will be smart
enough to do this?

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 04:00:28 -07:00
Stephen Hemminger 2fa7527ba1 IPV4: route rekey timer can be deferrable
No urgency on the rehash interval timer, so mark it as deferrable.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 01:55:27 -07:00
Stephen Hemminger 1294fc4a48 IPV4: route use jhash3
Since route hash is a triple, use jhash_3words rather doing the mixing
directly. This should be as fast and give better distribution.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 01:54:01 -07:00
Stephen Hemminger 5969f71d57 IPV4: route inline changes
Don't mark functions that are large as inline, let compiler decide.
Also, use inline rather than __inline__.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 01:52:09 -07:00
YOSHIFUJI Hideaki 878628fbf2 [NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.

We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki 1218854afa [NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS.
Without CONFIG_NET_NS, no namespace other than &init_net exists,
no need to store net in seq_net_private.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:56 +09:00
YOSHIFUJI Hideaki 3b1e0a655f [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki c346dca108 [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:53 +09:00
Stephen Hemminger 817bc4db77 [IPV4] route: use read_mostly
The route table parameters are set based on system memory and sysctl
values that almost never change. Also the genid only changes every
10 minutes.

RTprint is defined by never used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 17:43:59 -07:00
Denis V. Lunev ce25999078 [IPV4]: sk parameter is unused in ipv4_dst_blackhole.
Just remove it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 17:42:37 -07:00
Eric Dumazet ee6b967301 [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts
(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 18:30:47 -08:00
Denis V. Lunev 1937504dd1 [NETNS]: Enable all routing manipulation via netlink inside namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:52:04 -08:00
Denis V. Lunev 73b3871165 [NETNS]: Register /proc/net/rt_cache for each namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:51:18 -08:00
Denis V. Lunev a75e936f2f [NETNS]: Process /proc/net/rt_cache inside a namespace.
Show routing cache for a particular namespace only.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:50:55 -08:00
Denis V. Lunev 642d631811 [IPV4]: rt_cache_get_next should take rt_genid into account.
In the other case /proc/net/rt_cache will look inconsistent in respect to
genid.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:50:33 -08:00
Denis V. Lunev 317805b8f8 [NETNS]: Process ip_rt_redirect in the correct namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:50:06 -08:00
Wang Chen 770207208e [IPV4]: Use proc_create() to setup ->proc_fops first
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 14:14:25 -08:00
Patrick McHardy 4136cd523e [IPV4]: route: fix crash ip_route_input
ip_route_me_harder() may call ip_route_input() with skbs that don't
have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets
generated by the REJECT target, resulting in a crash when dereferencing
skb->dev->nd_net. Since ip_route_input() has an input device argument,
it seems correct to use that one anyway.

Bug introduced in b5921910a1 (Routing cache virtualization).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:58:20 -08:00
Eric Dumazet 29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Eric Dumazet e242297055 [NET]: should explicitely initialize atomic_t field in struct dst_ops
All but one struct dst_ops static initializations miss explicit
initialization of entries field.

As this field is atomic_t, we should use ATOMIC_INIT(0), and not
rely on atomic_t implementation.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:23 -08:00
Denis V. Lunev b5921910a1 [NETNS]: Routing cache virtualization.
Basically, this piece looks relatively easy. Namespace is already
available on the dst entry via device and the device is safe to
dereferrence. Compare it with one of a searcher and skip entry if
appropriate.

The only exception is ip_rt_frag_needed. So, add namespace parameter to it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:13 -08:00
Denis V. Lunev f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00