Commit Graph

35778 Commits

Author SHA1 Message Date
Hagen Paul Pfeifer
9d289715eb ipv6: stop sending PTB packets for MTU < 1280
Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:52:07 -05:00
Daniel Borkmann
2061dcd6bf net: sctp: fix race for one-to-many sockets in sendmsg's auto associate
I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.

Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.

The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.

strace from example application (shortened):

socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
           msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0

tcpdump before patch (fooling the application):

22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

tcpdump after patch:

14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

Looks like this bug is from the pre-git history museum. ;)

Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:52:20 -05:00
Johannes Berg
ee1c244219 genetlink: synchronize socket closing and family removal
In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:25 -05:00
Johannes Berg
5ad6300524 genetlink: disallow subscribing to unknown mcast groups
Jeff Layton reported that he could trigger the multicast unbind warning
in generic netlink using trinity. I originally thought it was a race
condition between unregistering the generic netlink family and closing
the socket, but there's a far simpler explanation: genetlink currently
allows subscribing to groups that don't (yet) exist, and the warning is
triggered when unsubscribing again while the group still doesn't exist.

Originally, I had a warning in the subscribe case and accepted it out of
userspace API concerns, but the warning was of course wrong and removed
later.

However, I now think that allowing userspace to subscribe to groups that
don't exist is wrong and could possibly become a security problem:
Consider a (new) genetlink family implementing a permission check in
the mcast_bind() function similar to the like the audit code does today;
it would be possible to bypass the permission check by guessing the ID
and subscribing to the group it exists. This is only possible in case a
family like that would be dynamically loaded, but it doesn't seem like a
huge stretch, for example wireless may be loaded when you plug in a USB
device.

To avoid this reject such subscription attempts.

If this ends up causing userspace issues we may need to add a workaround
in af_netlink to deny such requests but not return an error.

Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:24 -05:00
Eric Dumazet
ac64da0b83 net: rps: fix cpu unplug
softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:02:42 -05:00
Willem de Bruijn
f812116b17 ip: zero sockaddr returned on error queue
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:41:16 -05:00
David S. Miller
aaef66b837 Merge tag 'mac80211-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:28:36 -05:00
Linus Torvalds
a6391a924c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't use uninitialized data in IPVS, from Dan Carpenter.

 2) conntrack race fixes from Pablo Neira Ayuso.

 3) Fix TX hangs with i40e, from Jesse Brandeburg.

 4) Fix budget return from poll calls in dnet and alx, from Eric
    Dumazet.

 5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
    Jaeger.

 6) Fix bug introduced by conversion to list_head in TIPC retransmit
    code, from Jon Paul Maloy.

 7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
    Khoroshilov.

 8) Fix bridge build with INET disabled, from Arnd Bergmann.

 9) Fix netlink array overrun for PROBE attributes in openvswitch, from
    Thomas Graf.

10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
    Prashant Sreedharan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  tg3: Release tp->lock before invoking synchronize_irq()
  tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
  tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
  team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
  openvswitch: packet messages need their own probe attribtue
  i40e: adds FCoE configure option
  cxgb4vf: Fix queue allocation for 40G adapter
  netdevice: Add missing parentheses in macro
  bridge: only provide proxy ARP when CONFIG_INET is enabled
  neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
  net: fec: fix MDIO bus assignement for dual fec SoC's
  xen-netfront: use different locks for Rx and Tx stats
  drivers: net: cpsw: fix multicast flush in dual emac mode
  cxgb4vf: Initialize mdio_addr before using it
  net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
  usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
  MAINTAINERS: add me as ibmveth maintainer
  tipc: fix bug in broadcast retransmit code
  update ip-sysctl.txt documentation (v2)
  net/at91_ether: prepare and unprepare clock
  ...
2015-01-15 11:17:37 +13:00
Thomas Graf
1ba398041f openvswitch: packet messages need their own probe attribtue
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:49:44 -05:00
Arnd Bergmann
d92cfdbbea bridge: only provide proxy ARP when CONFIG_INET is enabled
When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:

net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'

This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 958501163d ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:08:02 -05:00
Jean-Francois Remy
4bf6980dd0 neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
    recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:28:00 -05:00
Jon Paul Maloy
164167794c tipc: fix bug in broadcast retransmit code
In commit 58dc55f256 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:01:59 -05:00
David S. Miller
2bd8221804 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
   unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
   christophe.

3) Fix race condition between conntrack confirmation and flush from
   userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
   applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
   the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
   Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 00:14:49 -05:00
Christoph Jaeger
46d2cfb192 packet: bail out of packet_snd() if L2 header creation fails
Due to a misplaced parenthesis, the expression

  (unlikely(offset) < 0),

which expands to

  (__builtin_expect(!!(offset), 0) < 0),

never evaluates to true. Therefore, when sending packets with
PF_PACKET/SOCK_DGRAM, packet_snd() does not abort as intended
if the creation of the layer 2 header fails.

Spotted by Coverity - CID 1259975 ("Operands don't affect result").

Fixes: 9c7077622d ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Christoph Jaeger <cj@linux.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-11 21:54:03 -05:00
Linus Torvalds
dc9319f5a3 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
  rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
  nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-09 18:10:48 -08:00
Linus Torvalds
20ebb34528 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "These are both pretty trivial: a sparse warning fix and size_t printk
  thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix sparse endianness warnings
  ceph: use %zu for len in ceph_fill_inline_data()
2015-01-09 17:55:00 -08:00
Ilya Dryomov
d7d5a007b1 libceph: fix sparse endianness warnings
The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:57 +03:00
J. Bruce Fields
49a068f82a rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.

This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary.  You're more
likely to hit this case on large directories.

Other xdr_truncate_encode callers are probably also affected.

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: 3e19ce762b "rpc: xdr_truncate_encode"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:03:58 -05:00
Arik Nemtsov
20658702e0 cfg80211: fix deadlock during reg chan check
If a P2P GO is active, the cfg80211_reg_can_beacon function will take
the wdev lock, in its call to cfg80211_go_permissive_chan. But the wdev lock
is already taken by the parent channel-checking function, causing a
deadlock.
Split the checking code into two parts. The first part will check if the
wdev is active and saves the channel under the wdev lock. The second part
will check actual channel validity according to type.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 14:53:46 +01:00
John Linville
cc72f6e227 mac80211: uninitialized return val in __ieee80211_sta_handle_tspec_ac_params
The return value should be initialized to false so that there's a
valid return value when there are no sessions that need work to be
done on them. Luckily, the side effect of using the uninitialized
value is an extra harmless driver call.

Coverity: CID 1260096
Fixes: 02219b3abc ("mac80211: add WMM admission control support")
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[extend commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-07 13:57:34 +01:00
Linus Torvalds
bdec419638 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a pile of random fixes, including:

   1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu.

   2) MDI{,X} eeprom check in e100 driver is reversed, from John W.
      Linville.

   3) Missing error return assignments in several ethernet drivers, from
      Julia Lawall.

   4) Altera TSE device doesn't come back up after ifconfig down/up
      sequence, fix from Kostya Belezko.

   5) Add more cases to the check for whether the qmi_wwan device has a
      bogus MAC address and needs to be assigned a random one.  From
      Kristian Evensen.

   6) Fix interrupt hangs in CPSW, from Felipe Balbi.

   7) Implement ndo_features_check in r8152 so that the stack doesn't
      feed GSO packets which are outside of the chip's capabilities.
      From Hayes Wang"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  qla3xxx: don't allow never end busy loop
  xen-netback: fixing the propagation of the transmit shaper timeout
  r8152: support ndo_features_check
  batman-adv: fix potential TT client + orig-node memory leak
  batman-adv: fix multicast counter when purging originators
  batman-adv: fix counter for multicast supporting nodes
  batman-adv: fix lock class for decoding hash in network-coding.c
  batman-adv: fix delayed foreign originator recognition
  batman-adv: fix and simplify condition when bonding should be used
  Revert "mac80211: Fix accounting of the tailroom-needed counter"
  net: ethernet: cpsw: fix hangs with interrupts
  enic: free all rq buffs when allocation fails
  qmi_wwan: Set random MAC on devices with buggy fw
  openvswitch: Consistently include VLAN header in flow and port stats.
  tcp: Do not apply TSO segment limit to non-TSO packets
  Altera TSE: Add missing phydev
  net/mlx4_core: Fix error flow in mlx4_init_hca()
  net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
  qlcnic: Fix return value in qlcnic_probe()
  net: axienet: fix error return code
  ...
2015-01-06 17:48:14 -08:00
Pablo Neira Ayuso
a2f18db0c6 netfilter: nf_tables: fix flush ruleset chain dependencies
Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[  353.373791] ------------[ cut here ]------------
[  353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[  353.373896] invalid opcode: 0000 [#1] SMP
[  353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[  353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[  353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[  353.375018] Call Trace:
[  353.375046]  [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540
[  353.375101]  [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0
[  353.375150]  [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0
[  353.375200]  [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790
[  353.375253]  [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0
[  353.375300]  [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70
[  353.375357]  [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30
[  353.375410]  [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0
[  353.375459]  [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400
[  353.375510]  [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90
[  353.375563]  [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20
[  353.375616]  [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0
[  353.375667]  [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80
[  353.375719]  [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d
[  353.375776]  [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20
[  353.375823]  [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:48 +01:00
Pablo Neira Ayuso
62924af247 netfilter: nfnetlink: relax strict multicast group check from netlink_bind
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:47 +01:00
Pablo Neira Ayuso
9ea2aa8b7d netfilter: nfnetlink: validate nfnetlink header from batch
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:46 +01:00
Pablo Neira Ayuso
8ca3f5e974 netfilter: conntrack: fix race between confirmation and flush
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list

This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.

The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:45 +01:00