Commit Graph

4046 Commits

Author SHA1 Message Date
Stefan Hajnoczi
413a4317ac VSOCK: add sock_diag interface
This patch adds the sock_diag interface for querying sockets from
userspace.  Tools like ss(8) and netstat(8) can use this interface to
list open sockets.

The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
netlink request and response structs.  The request can query sockets
based on their sk_state (e.g. listening sockets only) and the response
contains socket information fields including the local/remote addresses,
inode number, etc.

This patch does not dump VMCI pending sockets because I have only tested
the virtio transport, which does not use pending sockets.  Support can
be added later by extending vsock_diag_dump() if needed by VMCI users.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:44:17 -07:00
David S. Miller
53954cf8c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:19:22 -07:00
Linus Torvalds
076264ada9 Merge tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:

 - a stable fix for the alignment of the event number reported at the
   end of the 'DM_LIST_DEVICES' ioctl.

 - a couple stable fixes for the DM crypt target.

 - a DM raid health status reporting fix.

* tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm raid: fix incorrect status output at the end of a "recover" process
  dm crypt: reject sector_size feature if device length is not aligned to it
  dm crypt: fix memory leak in crypt_ctr_cipher_old()
  dm ioctl: fix alignment of event number in the device list
2017-10-05 15:17:40 -07:00
Linus Torvalds
9a431ef962 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Check iwlwifi 9000 reorder buffer out-of-space condition properly,
    from Sara Sharon.

 2) Fix RCU splat in qualcomm rmnet driver, from Subash Abhinov
    Kasiviswanathan.

 3) Fix session and tunnel release races in l2tp, from Guillaume Nault
    and Sabrina Dubroca.

 4) Fix endian bug in sctp_diag_dump(), from Dan Carpenter.

 5) Several mlx5 driver fixes from the Mellanox folks (max flow counters
    cap check, invalid memory access in IPoIB support, etc.)

 6) tun_get_user() should bail if skb->len is zero, from Alexander
    Potapenko.

 7) Fix RCU lookups in inetpeer, from Eric Dumazet.

 8) Fix locking in packet_do_bund().

 9) Handle cb->start() error properly in netlink dump code, from Jason
    A. Donenfeld.

10) Handle multicast properly in UDP socket early demux code. From Paolo
    Abeni.

11) Several erspan bug fixes in ip_gre, from Xin Long.

12) Fix use-after-free in socket filter code, in order to handle the
    fact that listener lock is no longer taken during the three-way TCP
    handshake. From Eric Dumazet.

13) Fix infoleak in RTM_GETSTATS, from Nikolay Aleksandrov.

14) Fix tail call generation in x86-64 BPF JIT, from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: 8021q: skip packets if the vlan is down
  bpf: fix bpf_tail_call() x64 JIT
  net: stmmac: dwmac-rk: Add RK3128 GMAC support
  rndis_host: support Novatel Verizon USB730L
  net: rtnetlink: fix info leak in RTM_GETSTATS call
  socket, bpf: fix possible use after free
  mlxsw: spectrum_router: Track RIF of IPIP next hops
  mlxsw: spectrum_router: Move VRF refcounting
  net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
  net: mvpp2: Fix clock resource by adding an optional bus clock
  r8152: add Linksys USB3GIGV1 id
  l2tp: fix l2tp_eth module loading
  ip_gre: erspan device should keep dst
  ip_gre: set tunnel hlen properly in erspan_tunnel_init
  ip_gre: check packet length and mtu correctly in erspan_xmit
  ip_gre: get key from session_id correctly in erspan_rcv
  tipc: use only positive error codes in messages
  ppp: fix __percpu annotation
  udp: perform source validation for mcast early demux
  IPv4: early demux can return an error code
  ...
2017-10-05 08:40:09 -07:00
Nicolas Dichtel
6621dd29eb dev: advertise the new nsid when the netns iface changes
x-netns interfaces are bound to two netns: the link netns and the upper
netns. Usually, this kind of interfaces is created in the link netns and
then moved to the upper netns. At the end, the interface is visible only
in the upper netns. The link nsid is advertised via netlink in the upper
netns, thus the user always knows where is the link part.

There is no such mechanism in the link netns. When the interface is moved
to another netns, the user cannot "follow" it.
This patch adds a new netlink attribute which helps to follow an interface
which moves to another netns. When the interface is unregistered, the new
nsid is advertised. If the interface is a x-netns interface (ie
rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed.

CC: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 18:04:41 -07:00
Alexei Starovoitov
468e2f64d2 bpf: introduce BPF_PROG_QUERY command
introduce BPF_PROG_QUERY command to retrieve a set of either
attached programs to given cgroup or a set of effective programs
that will execute for events within a cgroup

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 16:05:05 -07:00
Alexei Starovoitov
324bda9e6c bpf: multi program support for cgroup+bpf
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.

The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
  the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
  that cgroup program gets run in addition to the program in this cgroup.

NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.

Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.

To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 16:05:05 -07:00
Marcelo Ricardo Leitner
ac1ed8b82c sctp: introduce round robin stream scheduler
This patch introduces RFC Draft ndata section 3.2 Priority Based
Scheduler (SCTP_SS_RR).

Works by maintaining a list of enqueued streams and tracking the last
one used to send data. When the datamsg is done, it switches to the next
stream.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:29 -07:00
Marcelo Ricardo Leitner
637784ade2 sctp: introduce priority based stream scheduler
This patch introduces RFC Draft ndata section 3.4 Priority Based
Scheduler (SCTP_SS_PRIO).

It works by having a struct sctp_stream_priority for each priority
configured. This struct is then enlisted on a queue ordered per priority
if, and only if, there is a stream with data queued, so that dequeueing
is very straightforward: either finish current datamsg or simply dequeue
from the highest priority queued, which is the next stream pointed, and
that's it.

If there are multiple streams assigned with the same priority and with
data queued, it will do round robin amongst them while respecting
datamsgs boundaries (when not using idata chunks), to be reasonably
fair.

We intentionally don't maintain a list of priorities nor a list of all
streams with the same priority to save memory. The first would mean at
least 2 other pointers per priority (which, for 1000 priorities, that
can mean 16kB) and the second would also mean 2 other pointers but per
stream. As SCTP supports up to 65535 streams on a given asoc, that's
1MB. This impacts when giving a priority to some stream, as we have to
find out if the new priority is already being used and if we can free
the old one, and also when tearing down.

The new fields in struct sctp_stream_out_ext and sctp_stream are added
under a union because that memory is to be shared with other schedulers.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:29 -07:00
Marcelo Ricardo Leitner
0ccdf3c7fd sctp: add sockopt to get/set stream scheduler parameters
As defined per RFC Draft ndata Section 4.3.3, named as
SCTP_STREAM_SCHEDULER_VALUE.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:29 -07:00
Marcelo Ricardo Leitner
13aa8770fe sctp: add sockopt to get/set stream scheduler
As defined per RFC Draft ndata Section 4.3.2, named as
SCTP_STREAM_SCHEDULER.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:29 -07:00
Marcelo Ricardo Leitner
5bbbbe32a4 sctp: introduce stream scheduler foundations
This patch introduces the hooks necessary to do stream scheduling, as
per RFC Draft ndata.  It also introduces the first scheduler, which is
what we do today but now factored out: first come first served (FCFS).

With stream scheduling now we have to track which chunk was enqueued on
which stream and be able to select another other than the in front of
the main outqueue. So we introduce a list on sctp_stream_out_ext
structure for this purpose.

We reuse sctp_chunk->transmitted_list space for the list above, as the
chunk cannot belong to the two lists at the same time. By using the
union in there, we can have distinct names for these moments.

sctp_sched_ops are the operations expected to be implemented by each
scheduler. The dequeueing is a bit particular to this implementation but
it is to match how we dequeue packets today. We first dequeue and then
check if it fits the packet and if not, we requeue it at head. Thus why
we don't have a peek operation but have dequeue_done instead, which is
called once the chunk can be safely considered as transmitted.

The check removed from sctp_outq_flush is now performed by
sctp_stream_outq_migrate, which is only called during assoc setup.
(sctp_sendmsg() also checks for it)

The only operation that is foreseen but not yet added here is a way to
signalize that a new packet is starting or that the packet is done, for
round robin scheduler per packet, but is intentionally left to the
patch that actually implements it.

Support for I-DATA chunks, also described in this RFC, with user message
interleaving is straightforward as it just requires the schedulers to
probe for the feature and ignore datamsg boundaries when dequeueing.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:27:29 -07:00
Alexei Starovoitov
90caccdd8c bpf: fix bpf_tail_call() x64 JIT
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:04:44 -07:00
Linus Torvalds
887c8ba753 Merge tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are a number of USB fixes for 4.14-rc4 to resolved reported
  issues.

  There's a bunch of stuff in here based on the great work Andrey
  Konovalov is doing in fuzzing the USB stack. Lots of bug fixes when
  dealing with corrupted USB descriptors that we've never seen in
  "normal" operation, but is now ensuring the stack is much more
  hardened overall.

  There's also the usual XHCI and gadget driver fixes as well, and a
  build error fix, and a few other minor things, full details in the
  shortlog.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (38 commits)
  usb: dwc3: of-simple: Add compatible for Spreadtrum SC9860 platform
  usb: gadget: udc: atmel: set vbus irqflags explicitly
  usb: gadget: ffs: handle I/O completion in-order
  usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
  usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
  usb: gadget: udc: renesas_usb3: Fix return value of usb3_write_pipe()
  usb: gadget: udc: renesas_usb3: fix Pn_RAMMAP.Pn_MPKT value
  usb: gadget: udc: renesas_usb3: fix for no-data control transfer
  USB: dummy-hcd: Fix erroneous synchronization change
  USB: dummy-hcd: fix infinite-loop resubmission bug
  USB: dummy-hcd: fix connection failures (wrong speed)
  USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
  USB: devio: Don't corrupt user memory
  USB: devio: Prevent integer overflow in proc_do_submiturb()
  USB: g_mass_storage: Fix deadlock when driver is unbound
  USB: gadgetfs: Fix crash caused by inadequate synchronization
  USB: gadgetfs: fix copy_to_user while holding spinlock
  USB: uas: fix bug in handling of alternate settings
  usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
  usb-storage: fix bogus hardware error messages for ATA pass-thru devices
  ...
2017-10-03 09:25:40 -07:00
Maciej Żenczykowski
84e14fe353 net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND)
So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
even on IPv6 sockets.

However, it turns out it is perfectly reasonable to want to set freebind
on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
socket option on such a socket (they're all blindly errored out).

One use case for this is to allow spoofing src ip on a raw socket
via sendmsg cmsg.

Tested:
  built, and booted
  # python
  >>> import socket
  >>> SOL_IP = socket.SOL_IP
  >>> SOL_IPV6 = socket.IPPROTO_IPV6
  >>> IP_FREEBIND = 15
  >>> IPV6_FREEBIND = 78
  >>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  0
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  0
  >>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  1
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  1

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30 05:30:52 +01:00
Martin KaFai Lau
ad5b177bd7 bpf: Add map_name to bpf_map_info
This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.

The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Martin KaFai Lau
cb4d2b3f03 bpf: Add name, load_time, uid and map_ids to bpf_prog_info
The patch adds name and load_time to struct bpf_prog_aux.  They
are also exported to bpf_prog_info.

The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.

The kernel will reject name that contains characters other than
isalnum() and '_'.  It will also reject name that is not null
terminated.

The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.

The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info.  On the input, nr_map_ids tells how
big the userspace's map_ids buffer is.  On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Nikolay Aleksandrov
5af48b59f3 net: bridge: add per-port group_fwd_mask with less restrictions
We need to be able to transparently forward most link-local frames via
tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
mask which restricts the forwarding of STP and LACP, but we need to be able
to forward these over tunnels and control that forwarding on a per-port
basis thus add a new per-port group_fwd_mask option which only disallows
mac pause frames to be forwarded (they're always dropped anyway).
The patch does not change the current default situation - all of the others
are still restricted unless configured for forwarding.
We have successfully tested this patch with LACP and STP forwarding over
VxLAN and qinq tunnels.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:02:55 +01:00
Daniel Borkmann
de8f3a83b0 bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 13:36:44 -07:00
Petar Penkov
90e33d4594 tun: enable napi_gro_frags() for TUN/TAP driver
Add a TUN/TAP receive mode that exercises the napi_gro_frags()
interface. This mode is available only in TAP mode, as the interface
expects packets with Ethernet headers.

Furthermore, packets follow the layout of the iovec_iter that was
received. The first iovec is the linear data, and every one after the
first is a fragment. If there are more fragments than the max number,
drop the packet. Additionally, invoke eth_get_headlen() to exercise flow
dissector code and to verify that the header resides in the linear data.

The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option.
This is imposed because this mode is intended for testing via tools like
syzkaller and packetdrill, and the increased flexibility it provides can
introduce security vulnerabilities. This flag is accepted only if the
device is in TAP mode and has the IFF_NAPI flag set as well. This is
done because both of these are explicit requirements for correct
operation in this mode.

Signed-off-by: Petar Penkov <peterpenkov96@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: davem@davemloft.net
Cc: ppenkov@stanford.edu
Acked-by: Mahesh Bandewar <maheshb@google,com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25 20:16:13 -07:00
Petar Penkov
943170998b tun: enable NAPI for TUN/TAP driver
Changes TUN driver to use napi_gro_receive() upon receiving packets
rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these
changes and operation is not affected if the flag is disabled.  SKBs
are constructed upon packet arrival and are queued to be processed
later.

The new path was evaluated with a benchmark with the following setup:
Open two tap devices and a receiver thread that reads in a loop for
each device. Start one sender thread and pin all threads to different
CPUs. Send 1M minimum UDP packets to each device and measure sending
time for each of the sending methods:
	napi_gro_receive():	4.90s
	netif_rx_ni():		4.90s
	netif_receive_skb():	7.20s

Signed-off-by: Petar Penkov <peterpenkov96@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: davem@davemloft.net
Cc: ppenkov@stanford.edu
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25 20:16:13 -07:00
Mikulas Patocka
62e082430e dm ioctl: fix alignment of event number in the device list
The size of struct dm_name_list is different on 32-bit and 64-bit
kernels (so "(nl + 1)" differs between 32-bit and 64-bit kernels).

This mismatch caused some harmless difference in padding when using 32-bit
or 64-bit kernel. Commit 23d70c5e52 ("dm ioctl: report event number in
DM_LIST_DEVICES") added reporting event number in the output of
DM_LIST_DEVICES_CMD. This difference in padding makes it impossible for
userspace to determine the location of the event number (the location
would be different when running on 32-bit and 64-bit kernels).

Fix the padding by using offsetof(struct dm_name_list, name) instead of
sizeof(struct dm_name_list) to determine the location of entries.

Also, the ioctl version number is incremented to 37 so that userspace
can use the version number to determine that the event number is present
and correctly located.

In addition, a global event is now raised when a DM device is created,
removed, renamed or when table is swapped, so that the user can monitor
for device changes.

Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Fixes: 23d70c5e52 ("dm ioctl: report event number in DM_LIST_DEVICES")
Cc: stable@vger.kernel.org # 4.13
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-25 11:18:29 -04:00
Linus Torvalds
71aa60f67f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix NAPI poll list corruption in enic driver, from Christian
    Lamparter.

 2) Fix route use after free, from Eric Dumazet.

 3) Fix regression in reuseaddr handling, from Josef Bacik.

 4) Assert the size of control messages in compat handling since we copy
    it in from userspace twice. From Meng Xu.

 5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
    from Ursula Braun.

 6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.

 7) Don't use ARRAY_SIZE on spinlock array which might have zero
    entries, from Geert Uytterhoeven.

 8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov
    Kasiviswanathan.

 9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin
    Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  inet: fix improper empty comparison
  net: use inet6_rcv_saddr to compare sockets
  net: set tb->fast_sk_family
  net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
  MAINTAINERS: update git tree locations for ieee802154 subsystem
  net: prevent dst uses after free
  net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
  net/smc: no close wait in case of process shut down
  net/smc: introduce a delay
  net/smc: terminate link group if out-of-sync is received
  net/smc: longer delay for client link group removal
  net/smc: adapt send request completion notification
  net/smc: adjust net_device refcount
  net/smc: take RCU read lock for routing cache lookup
  net/smc: add receive timeout check
  net/smc: add missing dev_put
  net: stmmac: Cocci spatch "of_table"
  lan78xx: Use default values loaded from EEPROM/OTP after reset
  lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
  lan78xx: Fix for eeprom read/write when device auto suspend
  ...
2017-09-23 05:41:27 -10:00
Linus Torvalds
c0a3a64e72 Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
 "Major additions:

   - sysctl and seccomp operation to discover available actions
     (tyhicks)

   - new per-filter configurable logging infrastructure and sysctl
     (tyhicks)

   - SECCOMP_RET_LOG to log allowed syscalls (tyhicks)

   - SECCOMP_RET_KILL_PROCESS as the new strictest possible action

   - self-tests for new behaviors"

[ This is the seccomp part of the security pull request during the merge
  window that was nixed due to unrelated problems   - Linus ]

* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples: Unrename SECCOMP_RET_KILL
  selftests/seccomp: Test thread vs process killing
  seccomp: Implement SECCOMP_RET_KILL_PROCESS action
  seccomp: Introduce SECCOMP_RET_KILL_PROCESS
  seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
  seccomp: Action to log before allowing
  seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
  seccomp: Selftest for detection of filter flag support
  seccomp: Sysctl to configure actions that are allowed to be logged
  seccomp: Operation for checking if an action is available
  seccomp: Sysctl to display available actions
  seccomp: Provide matching filter for introspection
  selftests/seccomp: Refactor RET_ERRNO tests
  selftests/seccomp: Add simple seccomp overhead benchmark
  selftests/seccomp: Add tests for basic ptrace actions
2017-09-22 16:16:41 -10:00
Florian Fainelli
19cab88726 net: ethtool: Add back transceiver type
Commit 3f1ac7a700 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
deprecated the ethtool_cmd::transceiver field, which was fine in
premise, except that the PHY library was actually using it to report the
type of transceiver: internal or external.

Use the first word of the reserved field to put this __u8 transceiver
field back in. It is made read-only, and we don't expect the
ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this
is mostly for the legacy path where we do:

ethtool_get_settings()
-> dev->ethtool_ops->get_link_ksettings()
   -> convert_link_ksettings_to_legacy_settings()

to have no information loss compared to the legacy get_settings API.

Fixes: 3f1ac7a700 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:20:40 -07:00