Commit Graph

800 Commits

Author SHA1 Message Date
Eric Dumazet
d0bf4a9e92 net: cleanup and document skb fclone layout
Lets use a proper structure to clearly document and implement
skb fast clones.

Then, we might experiment more easily alternative layouts.

This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
to stop leaking of implementation details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 16:34:25 -04:00
David S. Miller
f5c7e1a47a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-09-25

1) Remove useless hash_resize_mutex in xfrm_hash_resize().
   This mutex is used only there, but xfrm_hash_resize()
   can't be called concurrently at all. From Ying Xue.

2) Extend policy hashing to prefixed policies based on
   prefix lenght thresholds. From Christophe Gouault.

3) Make the policy hash table thresholds configurable
   via netlink. From Christophe Gouault.

4) Remove the maximum authentication length for AH.
   This was needed to limit stack usage. We switched
   already to allocate space, so no need to keep the
   limit. From Herbert Xu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:19:15 -04:00
David S. Miller
1f6d80358d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:09:27 -04:00
Herbert Xu
689f1c9de2 ipsec: Remove obsolete MAX_AH_AUTH_LEN
While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
I thought that this limit was rather arbitrary and we should
just get rid of it.

In fact it seems that we've already done all the work needed
to remove it apart from actually removing it.  This limit was
there in order to limit stack usage.  Since we've already
switched over to allocating scratch space using kmalloc, there
is no longer any need to limit the authentication length.

This patch kills all references to it, including the BUG_ONs
that led me here.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-18 10:54:36 +02:00
Steffen Klassert
b8c203b2d2 xfrm: Generate queueing routes only from route lookup functions
Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:49 +02:00
Steffen Klassert
f92ee61982 xfrm: Generate blackhole routes only from route lookup functions
Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:40 +02:00
Florian Westphal
46cfd725c3 net: use kfree_skb_list() helper in more places
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:45 -07:00
Christophe Gouault
880a6fab8f xfrm: configure policy hash table thresholds by netlink
Enable to specify local and remote prefix length thresholds for the
policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

example:

    struct xfrmu_spdhthresh thresh4 = {
        .lbits = 0;
        .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
        .lbits = 0;
        .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

- lbits is the local threshold (source address for out policies,
  destination address for in and fwd policies).

- rbits is the remote threshold (destination address for out
  policies, source address for in and fwd policies).

The default values are:

XFRMA_SPD_IPV4_HTHRESH: 32 32
XFRMA_SPD_IPV6_HTHRESH: 128 128

Dynamic re-building of the SPD is performed when the thresholds values
are changed.

The current thresholds can be read via a XFRM_MSG_GETSPDINFO request:
the kernel replies to XFRM_MSG_GETSPDINFO requests by an
XFRM_MSG_NEWSPDINFO message, with both attributes
XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02 13:37:56 +02:00
Christophe Gouault
b58555f176 xfrm: hash prefixed policies based on preflen thresholds
The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-02 13:29:44 +02:00
Ying Xue
0244790c8a xfrm: remove useless hash_resize_mutex locks
In xfrm_state.c, hash_resize_mutex is defined as a local variable
and only used in xfrm_hash_resize() which is declared as a work
handler of xfrm.state_hash_work. But when the xfrm.state_hash_work
work is put in the global workqueue(system_wq) with schedule_work(),
the work will be really inserted in the global workqueue if it was
not already queued, otherwise, it is still left in the same position
on the the global workqueue. This means the xfrm_hash_resize() work
handler is only executed once at any time no matter how many times
its work is scheduled, that is, xfrm_hash_resize() is not called
concurrently at all, so hash_resize_mutex is redundant for us.

Cc: Christophe Gouault <christophe.gouault@6wind.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-08-29 11:40:03 +02:00
Ken Helias
1d023284c3 list: fix order of arguments for hlist_add_after(_rcu)
All other add functions for lists have the new item as first argument
and the position where it is added as second argument.  This was changed
for no good reason in this function and makes using it unnecessary
confusing.

The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:24 -07:00
Tobias Brunner
a0e5ef53aa xfrm: Fix installation of AH IPsec SAs
The SPI check introduced in ea9884b3ac
was intended for IPComp SAs but actually prevented AH SAs from getting
installed (depending on the SPI).

Fixes: ea9884b3ac ("xfrm: check user specified spi for IPComp")
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-06-30 07:42:12 +02:00
Steffen Klassert
b7eea4545e xfrm: Fix refcount imbalance in xfrm_lookup
xfrm_lookup must return a dst_entry with a refcount for the caller.
Git commit 1a1ccc96ab ("xfrm: Remove caching of xfrm_policy_sk_bundles")
removed this refcount for the socket policy case accidentally.
This patch restores it and sets DST_NOCACHE flag to make sure
that the dst_entry is freed when the refcount becomes null.

Fixes: 1a1ccc96ab ("xfrm: Remove caching of xfrm_policy_sk_bundles")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-06-26 07:52:42 +02:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
Michal Kubecek
21ee543edc xfrm: fix race between netns cleanup and state expire notification
The xfrm_user module registers its pernet init/exit after xfrm
itself so that its net exit function xfrm_user_net_exit() is
executed before xfrm_net_exit() which calls xfrm_state_fini() to
cleanup the SA's (xfrm states). This opens a window between
zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
instances which may access it (via the timer). If an xfrm state
expires in this window, xfrm_exp_state_notify() will pass null
pointer as socket to nlmsg_multicast().

As the notifications are called inside rcu_read_lock() block, it
is sufficient to retrieve the nlsk socket with rcu_dereference()
and check the it for null.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 16:07:44 -07:00
David S. Miller
65db611a5c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-05-22

This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?

I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.

1) Simplify the xfrm audit handling, from Tetsuo Handa.

2) Codingstyle cleanup for xfrm_output, from abian Frederick.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 16:00:00 -04:00
Fabian Frederick
fc68086ce8 net/xfrm/xfrm_output.c: move EXPORT_SYMBOL
Fix checkpatch warning:
"WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable"

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-05-13 12:44:28 +02:00
David S. Miller
5f013c9bc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/altera/altera_sgdma.c
	net/netlink/af_netlink.c
	net/sched/cls_api.c
	net/sched/sch_api.c

The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces.  These were simple transformations from
netlink_capable to netlink_ns_capable.

The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-12 13:19:14 -04:00
WANG Cong
698365fa18 net: clean up snmp stats code
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f58f
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07 16:06:05 -04:00
Eric W. Biederman
90f62cf30a net: Use netlink_ns_capable to verify the permisions of netlink messages
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:44:54 -04:00
Tetsuo Handa
2e71029e2c xfrm: Remove useless xfrm_audit struct.
Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.

This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-04-23 08:21:04 +02:00
Tetsuo Handa
f1370cc4a0 xfrm: Remove useless secid field from xfrm_audit.
It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
something strange at xfrm_audit_helper_usrinfo().
If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
audit_log_task_context() which basically does
secid != 0 && security_secid_to_secctx(secid) == 0 case
except that secid is obtained from current thread's context.

Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
obtained from other thread's context? It might audit current thread's
context rather than other thread's context if security_secid_to_secctx()
in xfrm_audit_helper_usrinfo() failed for some reason.

Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
secid obtained from current thread's context or secid == 0?
It seems to me that they are.

If I didn't miss something, we don't need to pass secid to
xfrm_audit_helper_usrinfo() because audit_log_task_context() will
obtain secid from current thread's context.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-04-22 10:47:53 +02:00
Eric Dumazet
aad88724c9 ipv4: add a sock pointer to dst->output() path.
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15 13:47:15 -04:00
David S. Miller
04f58c8854 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/devicetree/bindings/net/micrel-ks8851.txt
	net/core/netpoll.c

The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.

In micrel-ks8851.txt we simply have overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 20:29:20 -04:00
David S. Miller
995dca4ce9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
One patch to rename a newly introduced struct. The rest is
the rework of the IPsec virtual tunnel interface for ipv6 to
support inter address family tunneling and namespace crossing.

1) Rename the newly introduced struct xfrm_filter to avoid a
   conflict with iproute2. From Nicolas Dichtel.

2) Introduce xfrm_input_afinfo to access the address family
   dependent tunnel callback functions properly.

3) Add and use a IPsec protocol multiplexer for ipv6.

4) Remove dst_entry caching. vti can lookup multiple different
   dst entries, dependent of the configured xfrm states. Therefore
   it does not make to cache a dst_entry.

5) Remove caching of flow informations. vti6 does not use the the
   tunnel endpoint addresses to do route and xfrm lookups.

6) Update the vti6 to use its own receive hook.

7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
   and is replaced by the IPsec protocol multiplexer hooks.

8) Support inter address family tunneling for vti6.

9) Check if the tunnel endpoints of the xfrm state and the vti interface
   are matching and return an error otherwise.

10) Enable namespace crossing for vti devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18 14:09:07 -04:00