Commit Graph

697 Commits

Author SHA1 Message Date
David S. Miller
5b2941b18d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
A number of significant new features and optimizations for net-next/3.12.
Highlights are:
 * "Megaflows", an optimization that allows userspace to specify which
   flow fields were used to compute the results of the flow lookup.
   This allows for a major reduction in flow setups (the major
   performance bottleneck in Open vSwitch) without reducing flexibility.
 * Converting netlink dump operations to use RCU, allowing for
   additional parallelism in userspace.
 * Matching and modifying SCTP protocol fields.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-27 22:11:18 -04:00
Patrick McHardy
48b1de4c11 netfilter: add SYNPROXY core/target
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:27:54 +02:00
Patrick McHardy
41d73ec053 netfilter: nf_conntrack: make sequence number adjustments usuable without NAT
Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28 00:26:48 +02:00
Joe Stringer
a175a72330 openvswitch: Add SCTP support
This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-26 14:03:13 -07:00
David S. Miller
b05930f5d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/trans.c
	include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26 16:37:08 -04:00
Andy Zhou
03f0d916aa openvswitch: Mega flow implementation
Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23 16:43:07 -07:00
stephen hemminger
4a5a8aa6c9 ipv4: expose IPV4_DEVCONF
IP sends device configuration (see inet_fill_link_af) as an array
in the netlink information, but the indices in that array are not
exposed to userspace through any current santized header file.

It was available back in 2.6.32 (in /usr/include/linux/sysctl.h)
but was broken by:
  commit 02291680ff
  Author: Eric W. Biederman <ebiederm@xmission.com>
  Date:   Sun Feb 14 03:25:51 2010 +0000

    net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers

Eric was solving the sysctl problem but then the indices were re-exposed
by a later addition of devconf support for IPV4

  commit 9f0f7272ac
  Author: Thomas Graf <tgraf@infradead.org>
  Date:   Tue Nov 16 04:32:48 2010 +0000

    ipv4: AF_INET link address family

Putting them in /usr/include/linux/ip.h seemed the logical match
for the DEVCONF_ definitions for IPV6 in /usr/include/linux/ip6.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22 20:30:15 -07:00
Pavel Emelyanov
76975e9cb4 tun: Get skfilter layout
The only thing we may have from tun device is the fprog, whic contains
the number of filter elements and a pointer to (user-space) memory
where the elements are. The program itself may not be available if the
device is persistent and detached.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21 12:21:45 -07:00
Pavel Emelyanov
849c9b6f93 tun: Allow to skip filter on attach
There's a small problem with sk-filters on tun devices. Consider
an application doing this sequence of steps:

fd = open("/dev/net/tun");
ioctl(fd, TUNSETIFF, { .ifr_name = "tun0" });
ioctl(fd, TUNATTACHFILTER, &my_filter);
ioctl(fd, TUNSETPERSIST, 1);
close(fd);

At that point the tun0 will remain in the system and will keep in
mind that there should be a socket filter at address '&my_filter'.

If after that we do

fd = open("/dev/net/tun");
ioctl(fd, TUNSETIFF, { .ifr_name = "tun0" });

we most likely receive the -EFAULT error, since tun_attach() would
try to connect the filter back. But (!) if we provide a filter at
address &my_filter, then tun0 will be created and the "new" filter
would be attached, but application may not know about that.

This may create certain problems to anyone using tun-s, but it's
critical problem for c/r -- if we meet a persistent tun device
with a filter in mind, we will not be able to attach to it to dump
its state (flags, owner, address, vnethdr size, etc.).

The proposal is to allow to attach to tun device (with TUNSETIFF)
w/o attaching the filter to the tun-file's socket. After this
attach app may e.g clean the device by dropping the filter, it
doesn't want to have one, or (in case of c/r) get information
about the device with tun ioctls.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21 12:21:45 -07:00
Pavel Emelyanov
fb7589a162 tun: Add ability to create tun device with given index
Tun devices cannot be created with ifidex user wants, but it's
required by checkpoint-restore project.

Long time ago such ability was implemented for rtnl_ops-based
interface for creating links (9c7dafbf net: Allow to create links
with given ifindex), but the only API for creating and managing
tuntap devices is ioctl-based and is evolving with adding new ones
(cde8b15f tuntap: add ioctl to attach or detach a file form tuntap
device).

Following that trend, here's how a new ioctl that sets the ifindex
for device, that _will_ be created by TUNSETIFF ioctl looks like.
So those who want a tuntap device with the ifindex N, should open
the tun device, call ioctl(fd, TUNSETIFINDEX, &N), then call TUNSETIFF.
If the index N is busy, then the register_netdev will find this out
and the ioctl would be failed with -EBUSY.

If setifindex is not called, then it will be generated as before.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21 12:21:45 -07:00
David S. Miller
89d5e23210 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_conntrack_proto_tcp.c

The conflict had to do with overlapping changes dealing with
fixing the use of an "s32" to hold the value returned by
NAT_OFFSET().

Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for your net-next tree.
More specifically, they are:

* Trivial typo fix in xt_addrtype, from Phil Oester.

* Remove net_ratelimit in the conntrack logging for consistency with other
  logging subsystem, from Patrick McHardy.

* Remove unneeded includes from the recently added xt_connlabel support, from
  Florian Westphal.

* Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
  this, from Florian Westphal.

* Remove tproxy core, now that we have socket early demux, from Florian
  Westphal.

* A couple of patches to refactor conntrack event reporting to save a good
  bunch of lines, from Florian Westphal.

* Fix missing locking in NAT sequence adjustment, it did not manifested in
  any known bug so far, from Patrick McHardy.

* Change sequence number adjustment variable to 32 bits, to delay the
  possible early overflow in long standing connections, also from Patrick.

* Comestic cleanups for IPVS, from Dragos Foianu.

* Fix possible null dereference in IPVS in the SH scheduler, from Daniel
  Borkmann.

* Allow to attach conntrack expectations via nfqueue. Before this patch, you
  had to use ctnetlink instead, thus, we save the conntrack lookup.

* Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:30:54 -07:00
Pravin B Shelar
58264848a5 openvswitch: Add vxlan tunneling support.
Following patch adds vxlan vport type for openvswitch using
vxlan api. So now there is vxlan dependency for openvswitch.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 00:15:44 -07:00
David S. Miller
2ff1cf12c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
Jesper Dangaard Brouer
8a8e3d84b1 net_sched: restore "linklayer atm" handling
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.

 tc class add ... htb rate X ceil Y linklayer atm

The linklayer setting is implemented by modifying the rate table
which is send to the kernel.  No direct parameter were
transferred to the kernel indicating the linklayer setting.

The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.

To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table.  It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.

Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect.  Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:43:08 -07:00
Nicolas Dichtel
38c67328ac netfilter: export xt_HMARK.h to userland
This file contains the API for the target "HMARK", hence it should be exported
to userland.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-14 10:48:05 +02:00
Nicolas Dichtel
f0c03956ac netfilter: export xt_rpfilter.h to userland
This file contains the API for the match "rpfilter", hence it should be exported
to userland.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-14 10:47:15 +02:00
Hannes Frederic Sowa
fc4eba58b4 ipv6: make unsolicited report intervals configurable for mld
Commit cab70040df ("net: igmp:
Reduce Unsolicited report interval to 1s when using IGMPv3") and
2690048c01 ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval") by William Manley made
igmp unsolicited report intervals configurable per interface and corrected
the interval of unsolicited igmpv3 report messages resendings to 1s.

Same needs to be done for IPv6:

MLDv1 (RFC2710 7.10.): 10 seconds
MLDv2 (RFC3810 9.11.): 1 second

Both intervals are configurable via new procfs knobs
mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.

(also added .force_mld_version to ipv6_devconf_dflt to bring structs in
line without semantic changes)

v2:
a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
   unsolicited_report_interval procfs knobs.
b) incorporate stylistic feedback from William Manley

v3:
a) add new DEVCONF_* values to the end of the enum (thanks to David
   Miller)

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: William Manley <william.manley@youview.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13 17:05:04 -07:00
David S. Miller
98f1b7f382 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.12.  It is mostly driver
stuff, although Johannes Berg and Simon Wunderlich make a good
showing with mac80211 bits (particularly some work on 5/10 MHz
channel support).

The usual suspects are mostly represented.  There are lots of updates
to iwlwifi, ath9k, ath10k, mwifiex, rt2x00, wil6210, as usual.
The bcma bus gets some love this time, as do cw1200, iwl4965, and a
few other bits here and there.  I don't think there is much unusual
here, FWIW.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13 15:59:09 -07:00
stephen hemminger
ebd8b934e2 pptp: fix byte order warnings
Pptp driver has lots of byte order warnings from sparse.
This was because the on-the-wire header is in network byte order (obviously)
but the definition did not reflect that.

Also, the address structure to user space actually put the call id
in host order. Rather than break ABI compatibility, just acknowledge
the existing design.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-13 15:10:22 -07:00
Pablo Neira Ayuso
bd07793705 netfilter: nfnetlink_queue: allow to attach expectations to conntracks
This patch adds the capability to attach expectations via nfnetlink_queue.
This is required by conntrack helpers that trigger expectations based on
the first packet seen like the TFTP and the DHCPv6 user-space helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13 16:32:10 +02:00
John W. Linville
89c2af3c14 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/ethernet/broadcom/Kconfig
2013-08-12 14:45:06 -04:00
John W. Linville
fa5978447c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-08-09 15:08:10 -04:00
John W. Linville
4f05444892 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2013-08-09 15:06:28 -04:00
Eliezer Tamir
288a937637 net: rename busy poll MIB counter
Rename mib counter from "low latency" to "busy poll"

v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.

So v2 just renames the counter to fit the new naming convention.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:39:08 -07:00
Eric Dumazet
1f07d03e20 net: add SNMP counters tracking incoming ECN bits
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08 22:24:59 -07:00