Commit Graph

280 Commits

Author SHA1 Message Date
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Al Viro
c0371da604 put iov_iter into msghdr
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:29:03 -05:00
Michael S. Tsirkin
dc9e51534b af_packet: virtio 1.0 stubs
This merely fixes sparse warnings, without actually
adding support for the new APIs.

Still working out the best way to enable the new
functionality.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:06:32 +02:00
David S. Miller
60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Michael S. Tsirkin
6e58040b84 af_packet: fix sparse warning
af_packet produces lots of these:
	net/packet/af_packet.c:384:39: warning: incorrect type in return expression (different modifiers)
	net/packet/af_packet.c:384:39:    expected struct page [pure] *
	net/packet/af_packet.c:384:39:    got struct page *

this seems to be because sparse does not realize that _pure
refers to function, not the returned pointer.

Tweak code slightly to avoid the warning.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:15:36 -05:00
Al Viro
8feb2fb2bb switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter()
... and kill skb_copy_datagram_iovec()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:39 -05:00
Al Viro
7eab8d9e8a new helper: memcpy_to_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:51 -05:00
Al Viro
6ce8e9ce59 new helper: memcpy_from_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:48 -05:00
Willem de Bruijn
9c7077622d packet: make packet_snd fail on len smaller than l2 header
When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:43:07 -05:00
David S. Miller
51f3d02b98 net: Add and use skb_copy_datagram_msg() helper.
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:46:40 -05:00
David S. Miller
fa2dbdc253 net: Pass a "more" indication down into netdev_start_xmit() code paths.
For now it will always be false.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
David S. Miller
10b3ad8c21 net: Do txq_trans_update() in netdev_start_xmit()
That way we don't have to audit every call site to make sure it is
doing this properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:39:55 -07:00
Daniel Borkmann
10c51b5623 net: add skb_get_tx_queue() helper
Replace occurences of skb_get_queue_mapping() and follow-up
netdev_get_tx_queue() with an actual helper function.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29 20:02:07 -07:00
David S. Miller
4798248e4e net: Add ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 23:02:45 -07:00
Eric Dumazet
dc808110bb packet: handle too big packets for PACKET_V3
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.

This is not generally the case, even if most existing tools do it right.

This patch clamps too long frames as API permits, and issue a one time
error on syslog.

[  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82

In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21 16:44:28 -07:00
Willem de Bruijn
68a360e82e packet: remove deprecated syststamp timestamp
No device driver will ever return an skb_shared_info structure with
syststamp non-zero, so remove the branch that tests for this and
optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE.

Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes
may refer to it.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29 11:39:50 -07:00
Fabian Frederick
fe8c0f4ac2 packet: remove unnecessary break after return
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:26:59 -07:00
Eric W. Biederman
90f62cf30a net: Use netlink_ns_capable to verify the permisions of netlink messages
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:44:54 -04:00
Eric W. Biederman
a53b72c83a net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump
The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
from it's sources it is not clear why it is wrong.  Move the computation
into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.

This does not yet correct the capability check but instead simply moves it to make
it clear what is going on.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24 13:44:53 -04:00
Andrew Lutomirski
78541c1dc6 net: Fix ns_capable check in sock_diag_put_filterinfo
The caller needs capabilities on the namespace being queried, not on
their own namespace.  This is a security bug, although it likely has
only a minor impact.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 12:49:39 -04:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Daniel Borkmann
8e2f1a63f2 packet: fix packet_direct_xmit for BQL enabled drivers
Currently, in packet_direct_xmit() we test the assigned netdevice queue
for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().

This can have the side-effect that BQL enabled drivers which make use
of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
within the stack and would not fully fill the device's TX ring from
packet sockets with PACKET_QDISC_BYPASS enabled.

Instead, use a test without BQL bit so that bursts can be absorbed
into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:29:12 -04:00
Daniel Borkmann
0f97ede45e packet: report tx_dropped in packet_direct_xmit
Since commit 015f0688f5 ("net: net: add a core netdev->tx_dropped
counter"), we can now account for TX drops from within the core
stack instead of drivers.

Therefore, fix packet_direct_xmit() and increase drop count when we
encounter a problem before driver's xmit function was called (we do
not want to doubly account for it).

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:29:11 -04:00
Daniel Borkmann
43279500de packet: respect devices with LLTX flag in direct xmit
Quite often it can be useful to test with dummy or similar
devices as a blackhole sink for skbs. Such devices are only
equipped with a single txq, but marked as NETIF_F_LLTX as
they do not require locking their internal queues on xmit
(or implement locking themselves). Therefore, rather use
HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.

trafgen mmap/TX_RING example against dummy device with config
foo: { fill(0xff, 64) } results in the following performance
improvements for such scenarios on an ordinary Core i7/2.80GHz:

Before:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
   293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
       192,501,104 branch-misses:k                                               ( +-  1.63% )
               831 context-switches:k                                            ( +-  9.18% )
                 7 cpu-migrations:k                                              ( +-  7.40% )
            69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
       671,552,021 cache-references:k                                            ( +-  1.29% )

      22.856401569 seconds time elapsed                                          ( +-  0.33% )

After:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
   145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
        59,867,100 branch-misses:k                                               ( +-  4.72% )
               384 context-switches:k                                            ( +-  3.76% )
                 6 cpu-migrations:k                                              ( +-  6.28% )
            70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
        90,879,408 cache-references:k                                            ( +-  1.35% )

      11.719372413 seconds time elapsed                                          ( +-  0.24% )

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Tom Herbert
61b905da33 net: Rename skb->rxhash to skb->hash
The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:58:20 -04:00