Pull documentation updates from Jonathan Corbet:
"Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send
everything to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build
fixes, and more"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
can-doc: Fixed a wrong filepath in can.txt
Documentation: Fix trivial typo in comment.
kgdb,docs: Fix typo and minor style issues
Documentation: add description for FTRACE probe status
doc: brief user documentation for completion
Documentation/misc-devices/mei: Fix indentation of embedded code.
Documentation/misc-devices/mei: Fix indentation of enumeration.
Documentation/misc-devices/mei: Fix spacing around parentheses.
Documentation/misc-devices/mei: Fix formatting of headings.
Documentation: devicetree: Fix double words in Doumentation/devicetree
Documentation: mm: Fix typo in vm.txt
lockstat: Add documentation on contention and contenting points
Documentation: fix blackfin gptimers-example build errors
Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
CodingStyle: enable emacs display of trailing whitespace
DocBook: Do not exceed argument list limit
gpio: board.txt: Fix the gpio name example
Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
MAINTAINERS: Add the docs-next git tree to the maintainer entry
...
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.
This patch includes:
- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending
The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.
We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.
We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.
The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.
The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
<linux/can/error.h> moved in the big UAPI shuffle; update the document to
note its new location.
Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
[jc: added changelog]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Update netlink_mmap.txt wrt. commit 4682a03586
("netlink: Always copy on mmap TX.").
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.
Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.
This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.
All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter updates for net-next
The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:
1) Rise default number of buckets in conntrack from 16384 to 65536 in
systems with >= 4GBytes, patch from Marcelo Leitner.
2) Small refactor to save one level on indentation in xt_osf, from
Joe Perches.
3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.
4) Another small cleanup to remove redundant variable in nfnetlink,
from Duan Jiong.
5) Fix compilation warning in nfnetlink_cthelper on parisc, from
Chen Gang.
6) Fix wrong format in debugging for ctseqadj, from Gao feng.
7) Selective conntrack flushing through the mark for ctnetlink, patch
from Kristian Evensen.
8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
not required anymore after the selective flushing patch, again from
Kristian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A fix to ipv6 structure definitions removed the now superfluous
definition of in6_pktinfo in this file.
But, use of the glibc definition requires defining _GNU_SOURCE
(see also https://sourceware.org/bugzilla/show_bug.cgi?id=6775).
Before this change, the following would fail for me:
make
make headers_install
make M=Documentation/networking/timestamping
with
Documentation/networking/timestamping/txtimestamp.c: In function '__recv_errmsg_cmsg':
Documentation/networking/timestamping/txtimestamp.c:205:33: error: dereferencing pointer to incomplete type
Documentation/networking/timestamping/txtimestamp.c:206:23: error: dereferencing pointer to incomplete type
After this patch compilation succeeded.
Fixes: cd91cc5bdd ("doc: fix the compile error of txtimestamp.c")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinson reported:
HOSTCC Documentation/networking/timestamping/txtimestamp
Documentation/networking/timestamping/txtimestamp.c:64:8: error:
redefinition of ‘struct in6_pktinfo’
struct in6_pktinfo {
^
In file included from /usr/include/arpa/inet.h:23:0,
from Documentation/networking/timestamping/txtimestamp.c:33:
/usr/include/netinet/in.h:456:8: note: originally defined here
struct in6_pktinfo
^
After we sync with libc header, we don't need this ugly hack any more.
Reported-by: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manually bumping either nf_conntrack_buckets or nf_conntrack_max has
become a common task as our Linux servers tend to serve more and more
clients/applications, so let's adjust nf_conntrack_buckets this to a
more updated value.
Now for systems with more than 4GB of memory, nf_conntrack_buckets
becomes 65536 instead of 16384, resulting in nf_conntrack_max=256k
entries.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix the typo, there should be "It".
On the other hand, fix whitespace errors detected by checkpatch.pl
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the erronous usage of an hexadecimal address in the
example, by replacing it with a decimal address.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation:
expand explanation of timestamp counter
Test:
new: flag -I requests and prints PKTINFO
new: flag -x prints payload (possibly truncated)
fix: remove pretty print that breaks common flag '-l 1'
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.
Note that user can use random port netdevice to access the switch.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.
This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.
The device operates in two different modes and the difference
in these two modes in primarily in the TX side.
(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.
RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.
(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.
RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.
The devices can be added using the "ip" command from the iproute2
package -
ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>