Commit Graph

35810 Commits

Author SHA1 Message Date
Eric Dumazet bdbbb8527b ipv4: tcp: get rid of ugly unicast_sock
In commit be9f4a44e7 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e50 ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac4 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:06:19 -08:00
Toshiaki Makita d4bcef3fbe net: Fix vlan_get_protocol for stacked vlan
vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
vlan tag or multiple vlans, which caused incorrect checksum calculation
in several drivers.

Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
vlan protocol.

As the logic is the same as skb_network_protocol(), create a common helper
function __vlan_get_protocol() and call it from existing functions.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 18:03:47 -08:00
Saran Maruti Ramanara cfbf654efc net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
When making use of RFC5061, section 4.2.4. for setting the primary IP
address, we're passing a wrong parameter header to param_type2af(),
resulting always in NULL being returned.

At this point, param.p points to a sctp_addip_param struct, containing
a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
the actual sctp_addr_param, which also contains a sctp_paramhdr, but
this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
param_type2af() can make use of. Since we already hold a pointer to
addr_param from previous line, just reuse it for param_type2af().

Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 17:45:23 -08:00
Pablo Neira 8b7c36d810 netlink: fix wrong subscription bitmask to group mapping in
The subscription bitmask passed via struct sockaddr_nl is converted to
the group number when calling the netlink_bind() and netlink_unbind()
callbacks.

The conversion is however incorrect since bitmask (1 << 0) needs to be
mapped to group number 1. Note that you cannot specify the group number 0
(usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP
since this is rejected through -EINVAL.

This problem became noticeable since 97840cb ("netfilter: nfnetlink:
fix insufficient validation in nfnetlink_bind") when binding to bitmask
(1 << 0) in ctnetlink.

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 17:43:47 -08:00
Li Wei 3cdaa5be9e ipv4: Don't increase PMTU with Datagram Too Big message.
RFC 1191 said, "a host MUST not increase its estimate of the Path
MTU in response to the contents of a Datagram Too Big message."

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 15:28:59 -08:00
Nicolas Dichtel 8997c27ec4 caif: remove wrong dev_net_set() call
src_net points to the netns where the netlink message has been received. This
netns may be different from the netns where the interface is created (because
the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.

It seems wrong to override the netns in the newlink() handler because if it
was not already src_net, it means that the user explicitly asks to create the
netdevice in another netns.

CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Fixes: 8391c4aab1 ("caif: Bugfixes in CAIF netdevice for close and flow control")
Fixes: c412540063 ("caif-hsi: Add rtnl support")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 14:20:02 -08:00
Eric Dumazet 811230cd85 tcp: ipv4: initialize unicast_sock sk_pacing_rate
When I added sk_pacing_rate field, I forgot to initialize its value
in the per cpu unicast_sock used in ip_send_unicast_reply()

This means that for sch_fq users, RST packets, or ACK packets sent
on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
once we reach the per flow limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 95bd09eb27 ("tcp: TSO packets automatic sizing")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:24:47 -08:00
Roopa Prabhu 59ccaaaa49 bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081

This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
handler does not return any bytes in the skb.

Alternately, the skb->len check can be moved inside rtnl_notify.

For the bridge vlan case described in 92081, there is also a fix needed
in bridge driver to generate a proper notification. Will fix that in
subsequent patch.

v2: rebase patch on net tree

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:21:31 -08:00
Neal Cardwell d6b1a8a92a tcp: fix timing issue in CUBIC slope calculation
This patch fixes a bug in CUBIC that causes cwnd to increase slightly
too slowly when multiple ACKs arrive in the same jiffy.

If cwnd is supposed to increase at a rate of more than once per jiffy,
then CUBIC was sometimes too slow. Because the bic_target is
calculated for a future point in time, calculated with time in
jiffies, the cwnd can increase over the course of the jiffy while the
bic_target calculated as the proper CUBIC cwnd at time
t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only
increases on jiffy tick boundaries.

So since the cnt is set to:
	ca->cnt = cwnd / (bic_target - cwnd);
as cwnd increases but bic_target does not increase due to jiffy
granularity, the cnt becomes too large, causing cwnd to increase
too slowly.

For example:
- suppose at the beginning of a jiffy, cwnd=40, bic_target=44
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10
- suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai()
   increases cwnd to 41
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13

So now CUBIC will wait for 13 packets to be ACKed before increasing
cwnd to 42, insted of 10 as it should.

The fix is to avoid adjusting the slope (determined by ca->cnt)
multiple times within a jiffy, and instead skip to compute the Reno
cwnd, the "TCP friendliness" code path.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell 9cd981dcf1 tcp: fix stretch ACK bugs in CUBIC
Change CUBIC to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, because we are now precisely accounting for stretch ACKs,
including delayed ACKs, we can now remove the delayed ACK tracking and
estimation code that tracked recent delayed ACK behavior in
ca->delayed_ack.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell c22bdca947 tcp: fix stretch ACK bugs in Reno
Change Reno to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, if snd_cwnd crosses snd_ssthresh during slow start
processing, and we then exit slow start mode, we need to carry over
any remaining "credit" for packets ACKed and apply that to additive
increase by passing this remaining "acked" count to
tcp_cong_avoid_ai().

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell 814d488c61 tcp: fix the timid additive increase on stretch ACKs
tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on
"stretch ACKs" -- cases where the receiver ACKed more than 1 packet in
a single ACK. For example, suppose w is 10 and we get a stretch ACK
for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2
(since acked/w = 20/10 = 2), but instead we were only increasing cwnd
by 1. This patch fixes that behavior.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:37 -08:00
Neal Cardwell e73ebb0881 tcp: stretch ACK fixes prep
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
cover more than the RFC-specified maximum of 2 packets. These stretch
ACKs can cause serious performance shortfalls in common congestion
control algorithms that were designed and tuned years ago with
receiver hosts that were not using LRO or GRO, and were instead
politely ACKing every other packet.

This patch series fixes Reno and CUBIC to handle stretch ACKs.

This patch prepares for the upcoming stretch ACK bug fix patches. It
adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
changes all congestion control algorithms to pass in 1 for the ACKed
count. It also changes tcp_slow_start() to return the number of packet
ACK "credits" that were not processed in slow start mode, and can be
processed by the congestion control module in additive increase mode.

In future patches we will fix tcp_cong_avoid_ai() to handle stretch
ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
and additive increase mode.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:37 -08:00
Christoph Hellwig 06539d3071 net: don't OOPS on socket aio
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 12:25:33 -08:00
David S. Miller bf693f7beb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
ipsec 2015-01-26

Just two small fixes for _decode_session6() where we
might decode to wrong header information in some rare
situations.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:28:38 -08:00
Hannes Frederic Sowa 6e9e16e614 ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba2d ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:22:14 -08:00
subashab@codeaurora.org fc752f1f43 ping: Fix race in free in receive path
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:04:53 -08:00
Herbert Xu 86f3cddbc3 udp_diag: Fix socket skipping within chain
While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy.  In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.

As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:02:41 -08:00
David S. Miller 7d63585bf0 Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Another set of last-minute fixes:
 * fix station double-removal when suspending while associating
 * fix the HT (802.11n) header length calculation
 * fix the CCK radiotap flag used for monitoring, a pretty
   old regression but a simple one-liner
 * fix per-station group-key handling

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:32:24 -08:00
Hannes Frederic Sowa df4d92549f ipv4: try to cache dst_entries which would cause a redirect
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:28:27 -08:00
Daniel Borkmann 600ddd6825 net: sctp: fix slab corruption from use after free on INIT collisions
When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c6 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c6 ...

[  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[  533.940559] PGD 5030f2067 PUD 0
[  533.957104] Oops: 0000 [#1] SMP
[  533.974283] Modules linked in: sctp mlx4_en [...]
[  534.939704] Call Trace:
[  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
[  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[  669.826424]  __slab_alloc+0x4bf/0x566
[  669.826433]  __kmalloc+0x280/0x310
[  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
[  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[  669.826635]  __slab_free+0x39/0x2a8
[  669.826643]  kfree+0x1d6/0x230
[  669.826650]  kzfree+0x31/0x40
[  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
[  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
[  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Fixes: 730fc3d05c ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:02:05 -08:00
Daniel Borkmann 3f2ab13594 net: cls_bpf: fix auto generation of per list handles
When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb84 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:50:19 -08:00
Daniel Borkmann 7913ecf69e net: cls_bpf: fix size mismatch on filter preparation
In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb84 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:50:18 -08:00
Martin KaFai Lau b0a1ba5992 ipv6: Fix __ip6_route_redirect
In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect.  The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
   one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e4, will stop the backtrack from
   happening.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 22:09:51 -08:00
Vivien Didelot 24df8986f3 net: dsa: set slave MII bus PHY mask
When registering a mdio bus, Linux assumes than every port has a PHY and tries
to scan it. If a switch port has no PHY registered, DSA will fail to register
the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch
PHYs mask.

As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no
registered PHYs for port 5 and port 6), with the following declared names:

	static struct dsa_chip_data switch_cdata = {
		[...]
		.port_names[0] = "sw0",
		.port_names[1] = "sw1",
		.port_names[2] = "sw2",
		.port_names[3] = "sw3",
		.port_names[4] = "sw4",
		.port_names[5] = "cpu",
	};

DSA will fail to create the switch instance. With the PHY mask set for the
slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will
be successfully created.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:00:54 -08:00