Commit Graph

3436 Commits

Author SHA1 Message Date
Daniel Borkmann cc7f7ab758 net: ipv6: mld: similarly to MLDv2 have min max_delay of 1
Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:21 -04:00
Daniel Borkmann 58c0ecfd8d net: ipv6: mld: implement RFC3810 MLDv2 mode only
RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
Daniel Borkmann e3f5b17047 net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation
Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
Daniel Borkmann 6c567b78c8 net: ipv6: mld: clean up MLD_V1_SEEN macro
Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
Daniel Borkmann 89225d1ce6 net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.
i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 14:53:20 -04:00
David S. Miller 48f8e0af86 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following batch contains:

* Three fixes for the new synproxy target available in your
  net-next tree, from Jesper D. Brouer and Patrick McHardy.

* One fix for TCPMSS to correctly handling the fragmentation
  case, from Phil Oester. I'll pass this one to -stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 12:28:02 -04:00
Jesper Dangaard Brouer 7cc9eb6ef7 netfilter: SYNPROXY: let unrelated packets continue
Packets reaching SYNPROXY were default dropped, as they were most
likely invalid (given the recommended state matching).  This
patch, changes SYNPROXY target to let packets, not consumed,
continue being processed by the stack.

This will be more in line other target modules. As it will allow
more flexible configurations of handling, logging or matching on
packets in INVALID states.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-04 11:44:23 +02:00
Jesper Dangaard Brouer 775ada6d9f netfilter: more strict TCP flag matching in SYNPROXY
Its seems Patrick missed to incoorporate some of my requested changes
during review v2 of SYNPROXY netfilter module.

Which were, to avoid SYN+ACK packets to enter the path, meant for the
ACK packet from the client (from the 3WHS).

Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets
that didn't exclude the ACK flag.

Go a step further with SYN packet/flag matching by excluding flags
ACK+FIN+RST, in both IPv4 and IPv6 modules.

The intented usage of SYNPROXY is as follows:
(gracefully describing usage in commit)

 iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
 iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \
         -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

 echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

This does filter SYN flags early, for packets in the UNTRACKED state,
but packets in the INVALID state with other TCP flags could still
reach the module, thus this stricter flag matching is still needed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-04 11:43:11 +02:00
Vijay Subramanian c995ae2259 tcp: Change return value of tcp_rcv_established()
tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).

After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().

We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:28 -04:00
Nicolas Dichtel ea23192e8e tunnels: harmonize cleanup done on skb on rx path
The goal of this patch is to harmonize cleanup done on a skbuff on rx path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:26 -04:00
Nicolas Dichtel 963a88b31d tunnels: harmonize cleanup done on skb on xmit path
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel 8b27f27797 skb: allow skb_scrub_packet() to be used by tunnels
This function was only used when a packet was sent to another netns. Now, it can
also be used after tunnel encapsulation or decapsulation.

Only skb_orphan() should not be done when a packet is not crossing netns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel 8b7ed2d91d iptunnels: remove net arg from iptunnel_xmit()
This argument is not used, let's remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Tim Gardner 3e25c65ed0 net: neighbour: Remove CONFIG_ARPD
This config option is superfluous in that it only guards a call
to neigh_app_ns(). Enabling CONFIG_ARPD by default has no
change in behavior. There will now be call to __neigh_notify()
for each ARP resolution, which has no impact unless there is a
user space daemon waiting to receive the notification, i.e.,
the case for which CONFIG_ARPD was designed anyways.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Joe Perches <joe@perches.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 21:41:43 -04:00
Cong Wang eb3c0d83cc net: unify skb_udp_tunnel_segment() and skb_udp6_tunnel_segment()
As suggested by Pravin, we can unify the code in case of duplicated
code.

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:01 -04:00
Cong Wang d949d826c0 ipv6: Add generic UDP Tunnel segmentation
Similar to commit 7313626745
(tunneling: Add generic Tunnel segmentation)

This patch adds generic tunneling offloading support for
IPv6-UDP based tunnels.

This can be used by tunneling protocols like VXLAN.

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:01 -04:00
Cong Wang f564f45c45 vxlan: add ipv6 proxy support
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na()
will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:01 -04:00
Cong Wang f39dc1023d ipv6: move in6_dev_finish_destroy() into core kernel
in6_dev_put() will be needed by vxlan module, so is
in6_dev_finish_destroy().

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang e15a00aafa vxlan: add ipv6 route short circuit support
route short circuit only has IPv4 part, this patch adds
the IPv6 part. nd_tbl will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang caf92bc400 ipv6: do not call ndisc_send_rs() with write lock
Because vxlan module will call ip6_dst_lookup() in TX path,
which will hold write lock. So we have to release this write lock
before calling ndisc_send_rs(), otherwise could deadlock.

Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang 034dfc5df9 ipv6: export in6addr_loopback to modules
It is needed by vxlan module. Noticed by Mike.

Cc: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang 5f81bd2e5d ipv6: export a stub for IPv6 symbols used by vxlan
In case IPv6 is compiled as a module, introduce a stub
for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used
by vxlan module. Suggested by Ben.

This is an ugly but easy solution for now.

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang 788787b559 ipv6: move ip6_local_out into core kernel
It will be used the vxlan kernel module.

Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:30:00 -04:00
Cong Wang 3ce9b35ff6 ipv6: move ip6_dst_hoplimit() into core kernel
It will be used by vxlan, and may not be inlined.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31 22:29:59 -04:00
Thomas Graf 816c5b5b01 ipv6: Remove redundant sk variable
A sk variable initialized to ndisc_sk is already available outside
of the branch.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30 17:18:59 -04:00