Commit Graph

985 Commits

Author SHA1 Message Date
Simon Wunderlich
f3baa393ff UAPI: add MPLS label stack definition
Labels for the Multiprotocol Label Switching are defined in RFC 3032
which was superseded by RFC 5462. Add the definition to UAPI and a stub
header for include/linux.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-04 13:51:06 -05:00
Simon Wunderlich
b62faf3cdc if_ether.h: add IEEE 802.21 Ethertype
Add the Ethertype for IEEE Std 802.21 - Media Independent Handover
Protocol. This Ethertype is used for network control messages.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-04 13:51:06 -05:00
Yuchung Cheng
f19c29e3e3 tcp: snmp stats for Fast Open, SYN rtx, and data pkts
Add the following snmp stats:

TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse
the remote does not accept it or the attempts timed out.

TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.

TCPOrigDataSent: number of outgoing packets with original data (excluding
retransmission but including data-in-SYN). This counter is different from
TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
more useful to track the TCP retransmission rate.

Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to
TCPFastOpenPassive.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Lawrence Brakmo <brakmo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-03 15:58:03 -05:00
Luis R. Rodriguez
7aa98047df net: move net_device priv_flags out from UAPI
These are private to userspace, and they're unstable
anyway and can be shuffled at will (see 080e4130b1)
so any userspace application relying on them is on crack.

Test compiled with allyesconfig.

mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ make allyesconfig
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ time make -j 20
...
  BUILD   arch/x86/boot/bzImage
Setup is 16992 bytes (padded to 17408 bytes).
System is 56153 kB
CRC 721d2751
Kernel: arch/x86/boot/bzImage is ready  (#1)
real    19m35.744s
user    280m37.984s
sys     27m54.104s

Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 15:59:09 -05:00
Luis R. Rodriguez
589f5816f3 net: kdoc struct net_device flags and priv_flags
We have documentation for these flags but they're scattered
all over the place. #defines don't allow documentation to be
written easily so to help to start bringing some documentation
together use the enums kdoc practice but keep the defines to
allow userspace to be able to #ifdef them.

I've verified the same values are assigned before and after
with a simple userspace test program [0] and checksumming the
output.

[0] http://drvbp1.linux-foundation.org/~mcgrof/kdoc/netdev_flags/

mcgrof@gnat ~/tmp $ ./check-flags | sha1sum
0ec5b6b1840aa3bb9ce464e61c564820871c92c3  -

Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 15:59:09 -05:00
Eric Dumazet
740b0f1841 tcp: switch rtt estimations to usec resolution
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
Address:Port
tcp    ESTAB      0      1         10.246.11.51:42959
10.246.11.52:64614
         cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 17:08:40 -05:00
Hannes Frederic Sowa
0b95227a7b ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.

Fixes: 93b36cf342 ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:51:01 -05:00
Hannes Frederic Sowa
1b34657635 ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.

This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.

As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.

The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.

Fixes: 482fc6094a ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:51:00 -05:00
Florian Westphal
8e165e2034 net: tcp: add mib counters to track zero window transitions
Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
  but can't because we would shrink current window.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:23:30 -05:00
Neil Jerram
2ebe21fdde net: order MPLS ethertypes numerically
All ethertypes other than ETH_P_MPLS_UC, ETH_P_MPLS_MC and
ETH_P_ATMMPOA were already ordered numerically.  This commit moves
those three ETH_P_... values into correct numerical order too.

Signed-off-by: Neil Jerram <Neil.Jerram@metaswitch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:14:26 -05:00
David S. Miller
1f5a7407e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Introduce skb_to_sgvec_nomark function to add further data to the sg list
   without calling sg_unmark_end first. Needed to add extended sequence
   number informations. From Fan Du.

2) Add IPsec extended sequence numbers support to the Authentication Header
   protocol for ipv4 and ipv6. From Fan Du.

3) Make the IPsec flowcache namespace aware, from Fan Du.

4) Avoid creating temporary SA for every packet when no key manager is
   registered. From Horia Geanta.

5) Support filtering of SA dumps to show only the SAs that match a
   given filter. From Nicolas Dichtel.

6) Remove caching of xfrm_policy_sk_bundles. The cached socket policy bundles
   are never used, instead we create a new cache entry whenever xfrm_lookup()
   is called on a socket policy. Most protocols cache the used routes to the
   socket, so this caching is not needed.

7)  Fix a forgotten SADB_X_EXT_FILTER length check in pfkey, from Nicolas
    Dichtel.

8) Cleanup error handling of xfrm_state_clone.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24 18:13:33 -05:00
John W. Linville
88daf80dcc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-02-20 15:02:02 -05:00
David S. Miller
1e8d6421cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_3ad.h
	drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 01:24:22 -05:00
Nicolas Dichtel
d3623099d3 ipsec: add support of limited SA dump
The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.

Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-02-17 07:18:19 +01:00
Linus Torvalds
3962dfbe22 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have a small collection of fixes in my for-linus branch.

  The big thing that stands out is a revert of a new ioctl.  Users
  haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
  to export the information"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use right clone root offset for compressed extents
  btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
  Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
  Btrfs: fix max_inline mount option
  Btrfs: fix a lockdep warning when cleaning up aborted transaction
  Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-16 11:05:27 -08:00
Linus Torvalds
bb0a05d756 Merge tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
 "Here are some small char/misc driver fixes, along with some
  documentation updates, for 3.14-rc3.  Nothing major, just a number of
  fixes for reported issues"

* tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "misc: eeprom: sunxi: Add new compatibles"
  Revert "ARM: sunxi: dt: Convert to the new SID compatibles"
  misc: mic: fix possible signed underflow (undefined behavior) in userspace API
  ARM: sunxi: dt: Convert to the new SID compatibles
  misc: eeprom: sunxi: Add new compatibles
  misc: genwqe: Fix potential memory leak when pinning memory
  Documentation:Update Documentation/zh_CN/arm64/memory.txt
  Documentation:Update Documentation/zh_CN/arm64/booting.txt
  Documentation:Chinese translation of Documentation/arm64/tagged-pointers.txt
  raw: set range for MAX_RAW_DEVS
  raw: test against runtime value of max_raw_minors
  Drivers: hv: vmbus: Don't timeout during the initial connection with host
  Drivers: hv: vmbus: Specify the target CPU that should receive notification
  VME: Correct read/write alignment algorithm
  mei: don't unset read cb ptr on reset
  mei: clear write cb from waiting list on reset
2014-02-14 16:13:00 -08:00
Chris Mason
11bcac89c0 Revert "btrfs: add ioctl to export size of global metadata reservation"
This reverts commit 01e219e806.

David Sterba found a different way to provide these features without adding a new
ioctl.  We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
CC: Jeff Mahoney <jeffm@suse.com>
2014-02-14 13:42:13 -08:00
Eric Dumazet
977cb0ecf8 tcp: add pacing_rate information into tcp_info
Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.

User exported fields are 64bit, even if kernel is currently using 32bit
fields.

lpaa5:~# ss -i
..
	 skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 16:09:43 -05:00
David S. Miller
886ab57c84 Merge tag 'linux-can-next-for-3.15-20140212' of git://gitorious.org/linux-can/linux-can-next
linux-can-next-for-3.15-20140212

Marc Kleine-Budde says:

====================
this is a pull request of eight patches for net-next/master.

Florian Vaussard contributed a series that merged the sja1000 of_platform
into the platform driver. The of_platform driver is finally removed.
Stephane Grosjean supplied a patch to allocate CANFD skbs. In a patch
by Uwe Kleine-König another missing copyright information was added to
a userspace header. And a patch by Yoann DI RUZZA that adds listen only
mode to the at91_can driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:16:00 -05:00
Ben Hutchings
073e3cf219 ethtool: Fix unwanted section breaks in kernel-doc
A colon almost unavoidably starts a new section.  The script should be
changed to provide a way to avoid this, but for now reword the
comments to avoid using colons.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:04 -05:00
Ben Hutchings
ba569dc3e8 ethtool: Move kernel-doc comment next to struct ethtool_dump definition
The kernel-doc script does not tolerate the macro definition in between.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:04 -05:00
Ben Hutchings
f432c095f7 ethtool: Expand documentation of struct ethtool_perm_addr
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
590912298c ethtool: Expand documentation of struct ethtool_stats
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
4e5a62db2b ethtool: Expand documentation of struct ethtool_test
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
fe5df1b91e ethtool: Expand documentation of string set types
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00