Daniel Borkmann says:
====================
pull-request: bpf-next 2020-08-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).
The main changes are:
1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.
2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
in-kernel inspection, from Yonghong Song and Alexei Starovoitov.
3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
unwinder errors, from Song Liu.
4) Allow cgroup local storage map to be shared between programs on the same
cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.
5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
load instructions, from Jean-Philippe Brucker.
6) Follow-up fixes on BPF socket lookup in combination with reuseport group
handling. Also add related BPF selftests, from Jakub Sitnicki.
7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
socket create/release as well as bind functions, from Stanislav Fomichev.
8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
xdp_statistics, from Peilin Ye.
9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.
10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.
11) Fix a bpftool segfault due to missing program type name and make it more robust
to prevent them in future gaps, from Quentin Monnet.
12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
resolver issue, from John Fastabend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
In the nsim_create(), rtnl_lock() is called before nsim_bpf_init().
If nsim_bpf_init() is failed, rtnl_unlock() should be called,
but it isn't called.
So, unbalanced locking would occur.
Fixes: e05b2d141f ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add UDP tunnel port handlers to our fake driver so we can test
the core infra.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.
Use the devlink_port_attrs struct to replace the relevant parameters.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register two control traps with devlink. The existing selftest at
tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
over all registered traps and checks that the action of non-drop traps
cannot be changed. Up until now only exception traps were tested, now
control traps will be tested as well.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The layer 3 exceptions are still subject to the same trap policer, so
nothing changes, but user space can choose to assign a different one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the policer drop counter is retrieved when the jiffies value is
a multiple of 64, the counter will not be incremented.
This randomly breaks a selftest [1] the reads the counter twice and
checks that it was incremented:
```
TEST: Trap policer [FAIL]
Policer drop counter was not incremented
```
Fix by always incrementing the counter by 1.
[1] tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh
Fixes: ad188458d0 ("netdevsim: Add devlink-trap policer support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case memory resources for dummy_data were allocated, release them
before return.
Addresses-Coverity-ID: 1491997 ("Resource leak")
Fixes: 7ef19d3b1d ("devlink: report error once U32_MAX snapshot ids have been used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a dummy callback to set trap group parameters. Return an error when
the 'fail_trap_group_set' debugfs file is set in order to exercise error
paths and verify that error is propagated to user space when should.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.
In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.
Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.
The next patch will enable user space to change this default binding.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register three dummy packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.
This will be used later on in the series to test the devlink-trap
policer infrastructure.
v2:
* Remove check about burst size being a power of 2 and instead add a
debugfs knob to fail the operation
* Provide max/min rate/burst size when registering policers and remove
the validity checks from nsim_dev_devlink_trap_policer_set()
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.
Remove this flag from all drivers that called
devlink_health_reporter_create.
All existing health reporters set auto recovery to true if they have a
recover method.
Yet, administrator can unset auto recover via netlink command as prior to
this patch.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Health reporters should be registered with auto recover set to true.
Align dummy reporter behaviour with that, as in later patch the option to
set auto recover behaviour will be removed.
In addition, align netdevsim selftest to the new default value.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.
There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.
The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.
The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.
Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.
By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.
With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.
This work also paves the way to introduce a mechanism for userspace to
request a snapshot.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.
A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.
This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.
Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.
This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.
Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the devlink region code in preparation for adding new operations
on regions.
Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).
This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.
In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.
Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>