Commit Graph

32090 Commits

Author SHA1 Message Date
Michael S. Tsirkin
25121173f7 skb: api to report errors for zero copy skbs
Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:57 -04:00
Michael S. Tsirkin
e19d6763cc skb: report completion status for zero copy skbs
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:57 -04:00
Richard Cochran
00ab94eeaf cpts: specify the input clock frequency via DT
This patch adds a way to configure the CPTS input clock scaling factors
via the device tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 12:21:32 -04:00
Richard Cochran
78ca0b2873 cpsw: add a DT field for the active time stamping port
Because time stamping on both external ports of the switch simultaneously
is positively useless from the application's point of view, this patch
provides a DT configuration method to choose the active port.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 12:21:32 -04:00
Richard Cochran
6b60393e08 cpsw: add a DT field for the cpts offset
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 12:21:32 -04:00
Richard Cochran
65f8f9a1c1 time: remove the timecompare code.
This patch removes the timecompare code from the kernel. The top five
reasons to do this are:

1. There are no more users of this code.
2. The original idea was a bit weak.
3. The original author has disappeared.
4. The code was not general purpose but tuned to a particular hardware,
5. There are better ways to accomplish clock synchronization.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 11:41:35 -04:00
Pavel Emelyanov
a8fc927780 sk-filter: Add ability to get socket filter program (v2)
The SO_ATTACH_FILTER option is set only. I propose to add the get
ability by using SO_ATTACH_FILTER in getsockopt. To be less
irritating to eyes the SO_GET_FILTER alias to it is declared. This
ability is required by checkpoint-restore project to be able to
save full state of a socket.

There are two issues with getting filter back.

First, kernel modifies the sock_filter->code on filter load, thus in
order to return the filter element back to user we have to decode it
into user-visible constants. Fortunately the modification in question
is interconvertible.

Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
speed up the run-time division by doing kernel_k = reciprocal(user_k).
Bad news is that different user_k may result in same kernel_k, so we
can't get the original user_k back. Good news is that we don't have
to do it. What we need to is calculate a user2_k so, that

  reciprocal(user2_k) == reciprocal(user_k) == kernel_k

i.e. if it's re-loaded back the compiled again value will be exactly
the same as it was. That said, the user2_k can be calculated like this

  user2_k = reciprocal(kernel_k)

with an exception, that if kernel_k == 0, then user2_k == 1.

The optlen argument is treated like this -- when zero, kernel returns
the amount of sock_fprog elements in filter, otherwise it should be
large enough for the sock_fprog array.

changes since v1:
* Declared SO_GET_FILTER in all arch headers
* Added decode of vlan-tag codes

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 11:17:15 -04:00
David S. Miller
810b6d7638 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to ixgbe, ixgbevf, igbvf, igb and
networking core (bridge).  Most notably is the addition of support
for local link multicast addresses in SR-IOV mode to the networking
core.

Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and
"ixgbe: Fix return value from macvlan filter function" is revised based
on community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31 14:26:44 -04:00
Eric Dumazet
f3335031b9 net: filter: add vlan tag access
BPF filters lack ability to access skb->vlan_tci

This patch adds two new ancillary accessors :

SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)

SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)

This allows libpcap/tcpdump to use a kernel filter instead of
having to fallback to accept all packets, then filter them in
user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Ani Sinha <ani@aristanetworks.com>
Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31 14:00:15 -04:00
Willem de Bruijn
ecd5cf5dc5 net: compute skb->rxhash if nic hash may be 3-tuple
Network device drivers can communicate a Toeplitz hash in skb->rxhash,
but devices differ in their hashing capabilities. All compute a 5-tuple
hash for TCP over IPv4, but for other connection-oriented protocols,
they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
the kernel computes a 5-tuple hash over the inner packet if possible,
but devices do not.

This patch recomputes the rxhash in software in all cases where it
cannot be certain that a 5-tuple was computed. Device drivers can avoid
recomputation by setting the skb->l4_rxhash flag.

Recomputing adds cycles to each packet when RPS is enabled or the
packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
two servers running unmodified netnext with rxhash computation
in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
how much time is spent in __skb_get_rxhash in this worst case:

     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.05%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash

With 200x TCP_RR it increases to

     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash

I considered having the patch explicitly skips recomputation when it knows
that it will not improve the hash (TCP over IPv4), but that conditional
complicates code without saving many cycles in practice, because it has
to take place after flow dissector.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31 13:56:40 -04:00
John Fastabend
815cccbf10 ixgbe: add setlink, getlink support to ixgbe and ixgbevf
This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31 13:18:29 -04:00
John Fastabend
e5a55a8987 net: create generic bridge ops
The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

	ndo_bridge_setlink()
	ndo_bridge_getlink()

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-31 13:18:28 -04:00
John Fastabend
b3343a2a2c net, ixgbe: handle link local multicast addresses in SR-IOV mode
In SR-IOV mode the PF driver acts as the uplink port and is
used to send control packets e.g. lldpad, stp, etc.

   eth0.1     eth0.2     eth0
   VF         VF         PF
   |          |          |   <-- stand-in for uplink
   |          |          |
  --------------------------
  |  Embedded Switch       |
  --------------------------
              |
             MAC   <-- uplink

But the embedded switch is setup to forward multicast addresses
to all interfaces both VFs and PF and onto the physical link.
This results in reserved MAC addresses used by control protocols
to be forwarded over the switch onto the VF.

In the LLDP case the PF sends an LLDPDU and it is currently
being forwarded to all the VFs who then see the PF as a peer.
This is incorrect.

This patch adds the multicast addresses to the RAR table in the
hardware to prevent this behavior.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-10-29 22:31:49 -07:00
Ming Lei
877bd862f3 usbnet: introduce usbnet 3 command helpers
This patch introduces the below 3 usb command helpers:

	usbnet_read_cmd / usbnet_write_cmd / usbnet_write_cmd_async

so that each low level driver doesn't need to implement them
by itself, and the dma buffer allocation for usb transfer and
runtime PM things can be handled just in one place.

Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:36:50 -04:00
Greg Suarez
9bf211a30b net: cdc_mbim: adding MBIM driver
The CDC Mobile Broadband Interface Model (MBIM) specification
extends CDC NCM by
 - removing the redundant ethernet header from the point-to-point
   USB channel
 - adding support for multiple IP (v4 and/or v6) sessions multiplexed
   on the same USB channel
 - adding a MBIM control channel encapsulated in CDC
 - adding Device Service Streams (DSS), which are non IP generic data
   streams multiplexed on the same USB channel as the IP sessions

MBIM devices are managed using the dedicated control channel, and no
data will flow on the data channel until a control session has been
established.  This driver has no knowledge of MBIM control messages.
It just exports the control channel to a /dev/cdc-wdmX character
device for userspace management applications. Such an application is
therefore required to use this driver.

This patch implements basic MBIM support, reusing the NCM and WDM driver
APIs, currently limited to IP sessions with SessionID 0. DSS and
multiplexed IP sessions are not yet supported.

Signed-off-by: Greg Suarez <gsuarez@smithmicro.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:40:11 -04:00
Bjørn Mork
c91ce3b6bf net: cdc_ncm: export shared symbols and definitions
Move symbols and definitons which can be shared with a
MBIM driver in a new header.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:40:11 -04:00
Greg Suarez
51615edd0a USB: cdc: add MBIM constants and structures
Based on revision 1.0 of "Universal Serial Bus Communications
Class Subclass Specification for Mobile Broadband Interface
Model" available from www.usb.org

Signed-off-by: Greg Suarez <gsuarez@smithmicro.com>
[bmork: added DSS defines]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:40:10 -04:00
Joachim Eastwood
0668744f79 net/at91_ether: add pdata flag for reverse Eth addr
This will allow us to remove the last mach include from at91_ether
and also make it easier to share address setup with macb.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:40:10 -04:00
David Howells
64d7155cdf UAPI: Remove empty non-UAPI Kbuild files
Remove non-UAPI Kbuild files that have become empty as a result of UAPI
disintegration.  They used to have only header-y lines in them and those have
now moved to the Kbuild files in the corresponding uapi/ directories.

Possibly these should not be removed but rather have a comment inserted to say
they are intentionally left blank.  This would make it easier to add generated
header lines in future without having to restore the infrastructure.

Note that at this point not all the UAPI disintegration parts have been merged,
so it is likely that more empty Kbuild files will turn up.

It is probably necessary to make the files non-empty to prevent the patch
program from automatically deleting them when it reduces them to nothing.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-10-17 12:31:15 +01:00
David Howells
0238047018 UAPI: Remove empty conditionals from include/linux/Kbuild
Remove empty conditionals from include/linux/Kbuild as the contents, with new
conditionals, have moved to include/uapi/linux/Kbuild.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-10-17 12:31:15 +01:00
Linus Torvalds
90a24a4a7e Merge branch 'frv' (FRV patches from David Howells)
Merge emailed FRV fixes from David Howells.

* frv:
  FRV: Fix linux/elf-fdpic.h
  FRV: Fix const sections change
  FRV: Fix incorrect symbol in copy_thread()
  FRV: Fix VLIW packing constraint violation in entry.S
2012-10-16 18:49:22 -07:00
David Howells
0c552e5fb9 FRV: Fix linux/elf-fdpic.h
It seems I accidentally switched the guard on linux/elf-fdpic.h from #ifdef
__KERNEL__ to #ifndef __KERNEL__ when attempting to expand the guarded region
to cover the elf_fdpic_params struct when doing the UAPI split - with the
result that the struct became unavailable to kernel code.

Move incorrectly guarded bits back to the kernelspace header.

Whilst we're at it, the __KERNEL__ guards can be deleted as they're no longer
necessary.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Fengguang Wu <fengguang.wu@intel.com>
cc: Lars-Peter Clausen <lars@metafoo.de>
cc: uclinux-dev@uclinux.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-16 18:49:15 -07:00
Al Viro
c54d0dc353 bury SEL_{IN,OUT,EX}
Had not been used for more than a decade and half; it used
to be a part of (in-kernel) ->select() API and it has been pining
for fjords since 2.1.23pre1.  This is an ex-parrot...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16 13:37:17 -04:00
David Howells
bbc1096ad8 Unexport some bits of linux/fs.h
There are some bits of linux/fs.h which are only used within the kernel and
shouldn't be in the UAPI.  Move these from uapi/linux/fs.h into linux/fs.h.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16 13:36:58 -04:00
Linus Torvalds
d25282d1c9 Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module signing support from Rusty Russell:
 "module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
  X.509: Fix indefinite length element skip error handling
  X.509: Convert some printk calls to pr_devel
  asymmetric keys: fix printk format warning
  MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
  MODSIGN: Make mrproper should remove generated files.
  MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
  MODSIGN: Use the same digest for the autogen key sig as for the module sig
  MODSIGN: Sign modules during the build process
  MODSIGN: Provide a script for generating a key ID from an X.509 cert
  MODSIGN: Implement module signature checking
  MODSIGN: Provide module signing public keys to the kernel
  MODSIGN: Automatically generate module signing keys if missing
  MODSIGN: Provide Kconfig options
  MODSIGN: Provide gitignore and make clean rules for extra files
  MODSIGN: Add FIPS policy
  module: signature checking hook
  X.509: Add a crypto key parser for binary (DER) X.509 certificates
  MPILIB: Provide a function to read raw data into an MPI
  X.509: Add an ASN.1 decoder
  X.509: Add simple ASN.1 grammar compiler
  ...
2012-10-14 13:39:34 -07:00