Commit Graph

843420 Commits

Author SHA1 Message Date
Andrii Nakryiko 84bf5e1f4f libbpf: add raw tracepoint attach API
Add a wrapper utilizing bpf_link "infrastructure" to allow attaching BPF
programs to raw tracepoints.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
Andrii Nakryiko f6de59c17f libbpf: add tracepoint attach API
Allow attaching BPF programs to kernel tracepoint BPF hooks specified by
category and name.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
Andrii Nakryiko b265002747 libbpf: add kprobe/uprobe attach API
Add ability to attach to kernel and user probes and retprobes.
Implementation depends on perf event support for kprobes/uprobes.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
Andrii Nakryiko 63f2f5ee85 libbpf: add ability to attach/detach BPF program to perf event
bpf_program__attach_perf_event allows to attach BPF program to existing
perf event hook, providing most generic and most low-level way to attach BPF
programs. It returns struct bpf_link, which should be passed to
bpf_link__destroy to detach and free resources, associated with a link.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
Andrii Nakryiko 1c2e9efc26 libbpf: introduce concept of bpf_link
bpf_link is an abstraction of an association of a BPF program and one of
many possible BPF attachment points (hooks). This allows to have uniform
interface for detaching BPF programs regardless of the nature of link
and how it was created. Details of creation and setting up of a specific
bpf_link is handled by corresponding attachment methods
(bpf_program__attach_xxx) added in subsequent commits. Once successfully
created, bpf_link has to be eventually destroyed with
bpf_link__destroy(), at which point BPF program is disassociated from
a hook and all the relevant resources are freed.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
Andrii Nakryiko d66f43666a libbpf: make libbpf_strerror_r agnostic to sign of error
It's often inconvenient to switch sign of error when passing it into
libbpf_strerror_r. It's better for it to handle that automatically.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-05 22:37:30 +02:00
David S. Miller c4cde5804d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-07-03

The following pull-request contains BPF updates for your *net-next* tree.

There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:

** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
  static int mlx5e_open_cq(struct mlx5e_channel *c,
                           struct dim_cq_moder moder,
                           struct mlx5e_cq_param *param,
                           struct mlx5e_cq *cq)
  =======
  int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
                    struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
  >>>>>>> e5a3e259ef

Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...

  drivers/net/ethernet/mellanox/mlx5/core/en.h +977

... and in mlx5e_open_xsk() ...

  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64

... needs the same rename from net_dim_cq_moder into dim_cq_moder.

** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
          int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
          struct dim_cq_moder icocq_moder = {0, 0};
          struct net_device *netdev = priv->netdev;
          struct mlx5e_channel *c;
          unsigned int irq;
  =======
          struct net_dim_cq_moder icocq_moder = {0, 0};
  >>>>>>> e5a3e259ef

Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.

Let me know if you run into any issues. Anyway, the main changes are:

1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.

2) Addition of two new per-cgroup BPF hooks for getsockopt and
   setsockopt along with a new sockopt program type which allows more
   fine-grained pass/reject settings for containers. Also add a sock_ops
   callback that can be selectively enabled on a per-socket basis and is
   executed for every RTT to help tracking TCP statistics, both features
   from Stanislav.

3) Follow-up fix from loops in precision tracking which was not propagating
   precision marks and as a result verifier assumed that some branches were
   not taken and therefore wrongly removed as dead code, from Alexei.

4) Fix BPF cgroup release synchronization race which could lead to a
   double-free if a leaf's cgroup_bpf object is released and a new BPF
   program is attached to the one of ancestor cgroups in parallel, from Roman.

5) Support for bulking XDP_TX on veth devices which improves performance
   in some cases by around 9%, from Toshiaki.

6) Allow for lookups into BPF devmap and improve feedback when calling into
   bpf_redirect_map() as lookup is now performed right away in the helper
   itself, from Toke.

7) Add support for fq's Earliest Departure Time to the Host Bandwidth
   Manager (HBM) sample BPF program, from Lawrence.

8) Various cleanups and minor fixes all over the place from many others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:48:21 -07:00
René van Dorst e2c746944e net: ethernet: mediatek: Fix overlapping capability bits.
Both MTK_TRGMII_MT7621_CLK and MTK_PATH_BIT are defined as bit 10.

This can causes issues on non-MT7621 devices which has the
MTK_PATH_BIT(MTK_ETH_PATH_GMAC1_RGMII) and MTK_TRGMII capability set.
The wrong TRGMII setup code can be executed. The current wrongly executed
code doesn’t do any harm on MT7623 and the TRGMII setup for the MT7623
SOC side is done in MT7530 driver So it wasn’t noticed in the test.

Move all capability bits in one enum so that they are all unique and easy
to expand in the future.

Because mtk_eth_path enum is merged in to mkt_eth_capabilities, the
variable path value is no longer between 0 to number of paths,
mtk_eth_path_name can’t be used anymore in this form. Convert the
mtk_eth_path_name array to a function to lookup the pathname.

The old code walked thru the mtk_eth_path enum, which is also merged
with mkt_eth_capabilities. Expand array mtk_eth_muxc so it can store the
name and capability bit of the mux. Convert the code so it can walk thru
the mtk_eth_muxc array.

Fixes: 8efaa653a8 ("net: ethernet: mediatek: Add MT7621 TRGMII mode support")
Signed-off-by: René van Dorst <opensource@vdorst.com>

v1->v2:
- Move all capability bits in one enum, suggested by Willem de Bruijn
- Convert the mtk_eth_path_name array to a function to lookup the pathname
- Expand array mtk_eth_muxc so it can also store the name and capability
  bit of the mux
- Updated commit message

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:37:10 -07:00
Weifeng Voon c3efed5ad1 net: stmmac: Enable dwmac4 jumbo frame more than 8KiB
Enable GMAC v4.xx and beyond to support 16KiB buffer.

Signed-off-by: Weifeng Voon <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:33:12 -07:00
Vincent Bernat 07a4ddec3c bonding: add an option to specify a delay between peer notifications
Currently, gratuitous ARP/ND packets are sent every `miimon'
milliseconds. This commit allows a user to specify a custom delay
through a new option, `peer_notif_delay'.

Like for `updelay' and `downdelay', this delay should be a multiple of
`miimon' to avoid managing an additional work queue. The configuration
logic is copied from `updelay' and `downdelay'. However, the default
value cannot be set using a module parameter: Netlink or sysfs should
be used to configure this feature.

When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
observe the 500 ms delay is respected:

    20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
    20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
    20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
    20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28

In bond_mii_monitor(), I have tried to keep the lock logic readable.
The change is due to the fact we cannot rely on a notification to
lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
only triggered once every N times, while we need to decrement the
counter each time.

iproute2 also needs to be updated to be able to specify this new
attribute through `ip link'.

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:30:48 -07:00
Colin Ian King 2368a870d6 net: ethernet: sun: remove redundant assignment to variable err
The variable err is being assigned with a value that is never
read and it is being updated in the next statement with a new value.
The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:12:13 -07:00
Colin Ian King a51df9f8da gve: fix -ENOMEM null check on a page allocation
Currently the check to see if a page is allocated is incorrect
and is checking if the pointer page is null, not *page as
intended.  Fix this.

Addresses-Coverity: ("Dereference before null check")
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:53:07 -07:00
David S. Miller e227701c45 Merge branch 'net-ICW-sendmsg-recvmsg'
Paolo Abeni says:

====================
net: use ICW for sk_proto->{send,recv}msg

This series extends ICW usage to one of the few remaining spots in fast-path
still hitting per packet retpoline overhead, namely the sk_proto->{send,recv}msg
calls.

The first 3 patches in this series refactor the existing code so that applying
the ICW macros is straight-forward: we demux inet_{recv,send}msg in ipv4 and
ipv6 variants so that each of them can easily select the appropriate TCP or UDP
direct call. While at it, a new helper is created to avoid excessive code
duplication, and the current ICWs for inet_{recv,send}msg are adjusted
accordingly.

The last 2 patches really introduce the new ICW use-case, respectively for the
ipv6 and the ipv4 code path.

This gives up to 5% performance improvement under UDP flood, and smaller but
measurable gains for TCP RR workloads.

v1 -> v2:
 - drop inet6_{recv,send}msg declaration from header file,
   prefer ICW macro instead
 - avoid unneeded reclaration for udp_sendmsg, as suggested by Willem
====================

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Paolo Abeni 6f24080e8a ipv4: use indirect call wrappers for {tcp, udp}_{recv, send}msg()
This avoids an indirect call per syscall for common ipv4 transports

v1 -> v2:
 - avoid unneeded reclaration for udp_sendmsg, as suggested by Willem

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Paolo Abeni 164c51fe82 ipv6: use indirect call wrappers for {tcp, udpv6}_{recv, send}msg()
This avoids an indirect call per syscall for common ipv6 transports

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Paolo Abeni a648a592dc net: adjust socket level ICW to cope with ipv6 variant of {recv, send}msg
After the previous patch we have ipv{6,4} variants for {recv,send}msg,
we should use the generic _INET ICW variant to call into the proper
build-in.

This also allows dropping the now unused and rather ugly _INET4 ICW macro

v1 -> v2:
 - use ICW macro to declare inet6_{recv,send}msg
 - fix a couple of checkpatch offender in the code context

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Paolo Abeni 68ab5d1496 ipv6: provide and use ipv6 specific version for {recv, send}msg
This will simplify indirect call wrapper invocation in the following
patch.

No functional change intended, any - out-of-tree - IPv6 user of
inet_{recv,send}msg can keep using the existing functions.

SCTP code still uses the existing version even for ipv6: as this series
will not add ICW for SCTP, moving to the new helper would not give
any benefit.

The only other in-kernel user of inet_{recv,send}msg is
pvcalls_conn_back_read(), but psvcalls explicitly creates only IPv4 socket,
so no need to update that code path, too.

v1 -> v2: drop inet6_{recv,send}msg declaration from header file,
   prefer ICW macro instead

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Paolo Abeni e473093639 inet: factor out inet_send_prepare()
The same code is replicated verbatim in multiple places, and the next
patches will introduce an additional user for it. Factor out a
helper and use it where appropriate. No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 13:51:54 -07:00
Colin Ian King 2559d7c4dd qlcnic: remove redundant assignment to variable err
The variable err is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:33:55 -07:00
Colin Ian King b70d846cf4 atl1c: remove redundant assignment to variable tpd_req
The variable tpd_req is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:32:39 -07:00
Sudarsana Reddy Kalluru cedeac9df4 qed: Add support for Timestamping the unicast PTP packets.
This patch adds driver changes to detect/timestamp the unicast PTP packets.

Changes from previous version:
-------------------------------
v2: Defined a macro for unicast ptp param mask.

Please consider applying this to "net-next".

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:30:41 -07:00
Catherine Sullivan 3c13ce74b6 gve: Fix u64_stats_sync to initialize start
u64_stats_fetch_begin needs to initialize start.

Signed-off-by: Catherine Sullivan <csully@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:27:46 -07:00
Mahesh Bandewar d62962b37c loopback: fix lockdep splat
dev_init_scheduler() and dev_activate() expect the caller to
hold RTNL. Since we don't want blackhole device to be initialized
per ns, we are initializing at init.

[    3.855027] Call Trace:
[    3.855034]  dump_stack+0x67/0x95
[    3.855037]  lockdep_rcu_suspicious+0xd5/0x110
[    3.855044]  dev_init_scheduler+0xe3/0x120
[    3.855048]  ? net_olddevs_init+0x60/0x60
[    3.855050]  blackhole_netdev_init+0x45/0x6e
[    3.855052]  do_one_initcall+0x6c/0x2fa
[    3.855058]  ? rcu_read_lock_sched_held+0x8c/0xa0
[    3.855066]  kernel_init_freeable+0x1e5/0x288
[    3.855071]  ? rest_init+0x260/0x260
[    3.855074]  kernel_init+0xf/0x180
[    3.855076]  ? rest_init+0x260/0x260
[    3.855078]  ret_from_fork+0x24/0x30

Fixes: 4de83b88c6 ("loopback: create blackhole net device similar to loopack.")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03 11:24:38 -07:00
Daniel Borkmann e5a3e259ef Merge branch 'bpf-tcp-rtt-hook'
Stanislav Fomichev says:

====================
Congestion control team would like to have a periodic callback to
track some TCP statistics. Let's add a sock_ops callback that can be
selectively enabled on a socket by socket basis and is executed for
every RTT. BPF program frequency can be further controlled by calling
bpf_ktime_get_ns and bailing out early.

I run neper tcp_stream and tcp_rr tests with the sample program
from the last patch and didn't observe any noticeable performance
difference.

v2:
* add a comment about second accept() in selftest (Yonghong Song)
* refer to tcp_bpf.readme in sample program (Yonghong Song)
====================

Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:03 +02:00
Stanislav Fomichev d78e3f0614 samples/bpf: fix tcp_bpf.readme detach command
Copy-paste, should be detach, not attach.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03 16:52:02 +02:00