commit 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
introduced aead. The function attach_aead kmemdup()s the algorithm
name during xfrm_state_construct().
However this memory is never freed.
Implementation has since been slightly modified in
commit ee5c23176f ("xfrm: Clone states properly on migration")
without resolving this leak.
This patch adds a kfree() call for the aead algorithm name.
Fixes: 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When we fail to attach the security context in xfrm_state_construct()
we'll return 0 as error value which, in turn, will wrongly claim success
to userland when, in fact, we won't be adding / updating the XFRM state.
This is a regression introduced by commit fd21150a0f ("[XFRM] netlink:
Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()").
Fix it by propagating the error returned by security_xfrm_state_alloc()
in this case.
Fixes: fd21150a0f ("[XFRM] netlink: Inline attach_encap_tmpl()...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Running LTP 'icmp-uni-basic.sh -6 -p ipcomp -m tunnel' test over
openvswitch + veth can trigger kernel panic:
BUG: unable to handle kernel NULL pointer dereference
at 00000000000000e0 IP: [<ffffffff8169d1d2>] xfrm_input+0x82/0x750
...
[<ffffffff816d472e>] xfrm6_rcv_spi+0x1e/0x20
[<ffffffffa082c3c2>] xfrm6_tunnel_rcv+0x42/0x50 [xfrm6_tunnel]
[<ffffffffa082727e>] tunnel6_rcv+0x3e/0x8c [tunnel6]
[<ffffffff8169f365>] ip6_input_finish+0xd5/0x430
[<ffffffff8169fc53>] ip6_input+0x33/0x90
[<ffffffff8169f1d5>] ip6_rcv_finish+0xa5/0xb0
...
It seems that tunnel.ip6 can have garbage values and also dereferenced
without a proper check, only tunnel.ip4 is being verified. Fix it by
adding one more if block for AF_INET6 and initialize tunnel.ip6 with NULL
inside xfrm6_rcv_spi() (which is similar to xfrm4_rcv_spi()).
Fixes: 049f8e2 ("xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Whenever thresholds are changed the hash tables are rebuilt. This is
done by enumerating all policies and hashing and inserting them into
the right table according to the thresholds and direction.
Because socket policies are also contained in net->xfrm.policy_all but
no hash tables are defined for their direction (dir + XFRM_POLICY_MAX)
this causes a NULL or invalid pointer dereference after returning from
policy_hash_bysel() if the rebuild is done while any socket policies
are installed.
Since the rebuild after changing thresholds is scheduled this crash
could even occur if the userland sets thresholds seemingly before
installing any socket policies.
Fixes: 53c2e285f9 ("xfrm: Do not hash socket policies")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
AFAICT this message is just printed whenever input validation fails.
This is a normal failure and we shouldn't be dumping the stack over it.
Looks like it was originally a printk that was maybe incorrectly
upgraded to a WARN:
commit 62db5cfd70
Author: stephen hemminger <shemminger@vyatta.com>
Date: Wed May 12 06:37:06 2010 +0000
xfrm: add severity to printk
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If we hit any of the error conditions inside xfrm_dump_sa(), then
xfrm_state_walk_init() never gets called. However, we still call
xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash
because the state walk was never initialized properly.
We can fix this by setting cb->args[0] only after we've processed the
first element and checking this before calling xfrm_state_walk_done().
Fixes: d3623099d3 ("ipsec: add support of limited SA dump")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In netdevice.h we removed the structure in net-next that is being
changes in 'net'. In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.
The mlx5 conflicts have to do with vxlan support dependencies.
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2016-05-04
1) The flowcache can hit an OOM condition if too
many entries are in the gc_list. Fix this by
counting the entries in the gc_list and refuse
new allocations if the value is too high.
2) The inner headers are invalid after a xfrm transformation,
so reset the skb encapsulation field to ensure nobody tries
access the inner headers. Otherwise tunnel devices stacked
on top of xfrm may build the outer headers based on wrong
informations.
3) Add pmtu handling to vti, we need it to report
pmtu informations for local generated packets.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A crash is observed when a decrypted packet is processed in receive
path. get_rps_cpus() tries to dereference the skb->dev fields but it
appears that the device is freed from the poison pattern.
[<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0
[<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc
[<ffffffc000af6094>] netif_rx+0x74/0x94
[<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0
[<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c
[<ffffffc000ba6eb8>] esp_input_done+0x20/0x30
[<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
[<ffffffc0000b7324>] worker_thread+0x2f8/0x418
[<ffffffc0000bb40c>] kthread+0xe0/0xec
-013|get_rps_cpu(
| dev = 0xFFFFFFC08B688000,
| skb = 0xFFFFFFC0C76AAC00 -> (
| dev = 0xFFFFFFC08B688000 -> (
| name =
"......................................................
| name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev =
0xAAAAAAAAAAA
Following are the sequence of events observed -
- Encrypted packet in receive path from netdevice is queued
- Encrypted packet queued for decryption (asynchronous)
- Netdevice brought down and freed
- Packet is decrypted and returned through callback in esp_input_done
- Packet is queued again for process in network stack using netif_rx
Since the device appears to have been freed, the dereference of
skb->dev in get_rps_cpus() leads to an unhandled page fault
exception.
Fix this by holding on to device reference when queueing packets
asynchronously and releasing the reference on call back return.
v2: Make the change generic to xfrm as mentioned by Steffen and
update the title to xfrm
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inner headers are invalid after a xfrm transformation.
So reset the skb encapsulation field to ensure nobody tries
to access the inner headers.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch removes the last reference to hash and ablkcipher from
IPsec and replaces them with ahash and skcipher respectively. For
skcipher there is currently no difference at all, while for ahash
the current code is actually buggy and would prevent asynchronous
algorithms from being discovered.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.
This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2015-12-22
Just one patch to fix dst_entries_init with multiple namespaces.
From Dan Streetman.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM can deal with SYNACK messages, sent while listener socket
is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
and xfrm_sk_policy_lookup()
This might serve as the first step to remove xfrm.xfrm_policy_lock
use in fast path.
Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will soon switch sk->sk_policy[] to RCU protection,
as SYNACK packets are sent while listener socket is not locked.
This patch simply adds RCU grace period before struct xfrm_policy
freeing, and the corresponding rcu_head in struct xfrm_policy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP SYNACK messages might now be attached to request sockets.
XFRM needs to get back to a listener socket.
Adds new helpers that might be used elsewhere :
sk_to_full_sk() and sk_const_to_full_sk()
Note: We also need to add RCU protection for xfrm lookups,
now TCP/DCCP have lockless listener processing. This will
be addressed in separate patches.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
templates; their dst_entries counters will never be used. Move the
xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
and dst_entries_destroy for each net namespace.
The ipv4 and ipv6 xfrms each create dst_ops template, and perform
dst_entries_init on the templates. The template values are copied to each
net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
pcpuc_entries field is a percpu counter and cannot be used correctly by
simply copying it to another object.
The result of this is a very subtle bug; changes to the dst entries
counter from one net namespace may sometimes get applied to a different
net namespace dst entries counter. This is because of how the percpu
counter works; it has a main count field as well as a pointer to the
percpu variables. Each net namespace maintains its own main count
variable, but all point to one set of percpu variables. When any net
namespace happens to change one of the percpu variables to outside its
small batch range, its count is moved to the net namespace's main count
variable. So with multiple net namespaces operating concurrently, the
dst_ops entries counter can stray from the actual value that it should
be; if counts are consistently moved from one net namespace to another
(which my testing showed is likely), then one net namespace winds up
with a negative dst_ops count while another winds up with a continually
increasing count, eventually reaching its gc_thresh limit, which causes
all new traffic on the net namespace to fail with -ENOBUFS.
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-10-30
1) The flow cache is limited by the flow cache limit which
depends on the number of cpus and the xfrm garbage collector
threshold which is independent of the number of cpus. This
leads to the fact that on systems with more than 16 cpus
we hit the xfrm garbage collector limit and refuse new
allocations, so new flows are dropped. On systems with 16
or less cpus, we hit the flowcache limit. In this case, we
shrink the flow cache instead of refusing new flows.
We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.
2) Fix some unaligned accesses on sparc systems.
From Sowmini Varadhan.
3) Fix some header checks in _decode_session4. We may call
pskb_may_pull with a negative value converted to unsigened
int from pskb_may_pull. This can lead to incorrect policy
lookups. We fix this by a check of the data pointer position
before we call pskb_may_pull.
4) Reload skb header pointers after calling pskb_may_pull
in _decode_session4 as this may change the pointers into
the packet.
5) Add a missing statistic counter on inner mode errors.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
Increment the LINUX_MIB_XFRMINSTATEMODEERROR statistic counter
to notify about dropped packets if we fail to fetch a inner mode.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
On sparc, deleting established SAs (e.g., by restarting ipsec)
results in unaligned access messages via xfrm_del_sa ->
km_state_notify -> xfrm_send_state_notify().
Even though struct xfrm_usersa_info is aligned on 8-byte boundaries,
netlink attributes are fundamentally only 4 byte aligned, and this
cannot be changed for nla_data() that is passed up to userspace.
As a result, the put_unaligned() macro needs to be used to
set up potentially unaligned fields such as the xfrm_stats in
copy_to_user_state()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>