Commit Graph

204 Commits

Author SHA1 Message Date
David S. Miller
6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
Amerigo Wang
c038a767cd ipv6: add a new namespace for nf_conntrack_reasm
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Nicolas Dichtel
b42664f898 netns: move net->ipv4.rt_genid to net->rt_genid
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Pablo Neira Ayuso
ace1fe1231 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-09-03 15:34:51 +02:00
Patrick McHardy
58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy
c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
Rami Rosen
f63c45e0e6 packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
  CC      net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
                 from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98db:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:29:45 -07:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Eric W. Biederman
e1fc3b14f9 sctp: Make sysctl tunables per net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:32:16 -07:00
Eric W. Biederman
ebb7e95d93 sctp: Add infrastructure for per net sysctls
Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
b01a24078f sctp: Make the mib per network namespace
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:36 -07:00
Eric W. Biederman
13d782f6b4 sctp: Make the proc files per network namespace.
- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:53 -07:00
Eric W. Biederman
2ce9550350 sctp: Make the ctl_sock per network namespace
- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Eric W. Biederman
7064d16e16 userns: Use kgids for sysctl_ping_group_range
- Store sysctl_ping_group_range as a paire of kgid_t values
  instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:10 -07:00
Eric Dumazet
0c7462a235 ipv4: remove rt_cache_rebuild_count
After IP route cache removal, rt_cache_rebuild_count is no longer
used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet
5815d5e7aa tcp: use hash_32() in tcp_metrics
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Eric Dumazet
be9f4a44e7 ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:35:30 -07:00
David S. Miller
51c5d0c4b1 tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.

Computed TCP metrics are no longer maintained in the route metrics.

The table uses RCU and an extremely simple hash so that it has low
latency and low overhead.  A simple hash is legitimate because we only
make metrics blobs for fully established connections.

Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking.  But
the basic design seems sound.

With help from Eric Dumazet and Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:39:57 -07:00
David S. Miller
f4530fa574 ipv4: Avoid overhead when no custom FIB rules are installed.
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 22:13:13 -07:00
David S. Miller
67da255210 Merge branch 'master' of git://1984.lsi.us.es/net-next 2012-06-11 12:56:14 -07:00
Gao feng
c8a627ed06 inetpeer: add namespace support for inetpeer
now inetpeer doesn't support namespace,the information will
be leaking across namespace.

this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.

add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.

and change family_to_base and inet_getpeer to support namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Gao feng
7080ba0955 netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMPv6 protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng
4b626b9c5d netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00