While eBPF programs can check whether a file is a socket by file->f_op
== &socket_file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.
This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.
Signed-off-by: Florent Revest <revest@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201204113609.1850150-2-revest@google.com
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-12-03
The main changes are:
1) Support BTF in kernel modules, from Andrii.
2) Introduce preferred busy-polling, from Björn.
3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.
4) Memcg-based memory accounting for bpf objects, from Roman.
5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
selftests/bpf: Fix invalid use of strncat in test_sockmap
libbpf: Use memcpy instead of strncpy to please GCC
selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
selftests/bpf: Add tp_btf CO-RE reloc test for modules
libbpf: Support attachment of BPF tracing programs to kernel modules
libbpf: Factor out low-level BPF program loading helper
bpf: Allow to specify kernel module BTFs when attaching BPF programs
bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
selftests/bpf: Add support for marking sub-tests as skipped
selftests/bpf: Add bpf_testmod kernel module for testing
libbpf: Add kernel module BTF support for CO-RE relocations
libbpf: Refactor CO-RE relocs to not assume a single BTF object
libbpf: Add internal helper to load BTF data by FD
bpf: Keep module's btf_data_size intact after load
bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
bpf: Adds support for setting window clamp
samples/bpf: Fix spelling mistake "recieving" -> "receiving"
bpf: Fix cold build of test_progs-no_alu32
...
====================
Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ability for user-space programs to specify non-vmlinux BTF when attaching
BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this,
attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD
of a module or vmlinux BTF object. For backwards compatibility reasons,
0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org
Background:
Broadcast and multicast packages are enqueued for later processing.
This queue was previously hardcoded to 1000.
This proved insufficient for handling very high packet rates.
This resulted in packet drops for multicast.
While at the same time unicast worked fine.
The change:
This patch make the queue length adjustable to accommodate
for environments with very high multicast packet rate.
But still keeps the default value of 1000 unless specified.
The queue length is specified as a request per macvlan
using the IFLA_MACVLAN_BC_QUEUE_LEN parameter.
The actual used queue length will then be the maximum of
any macvlan connected to the same port. The actual used
queue length for the port can be retrieved (read only)
by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification.
This will be followed up by a patch to iproute2
in order to adjust the parameter from userspace.
Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
Link: https://lore.kernel.org/r/dd4673b2-7eab-edda-6815-85c67ce87f63@paneda.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce get link command which loops through
all available links of all available link groups. It
uses the SMC-R linkgroup list as entry point, not
the socket list, which makes linkgroup diagnosis
possible, in case linkgroup does not contain active
connections anymore.
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce get linkgroup command which loops through
all available SMCR linkgroups. It uses the SMC-R linkgroup
list as entry point, not the socket list, which makes
linkgroup diagnosis possible, in case linkgroup does not
contain active connections anymore.
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new netlink command to obtain system information
of the smc module.
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.
Conflicts:
drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To provide support for ChaCha-Poly cipher we need to define
specific constants and structures.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the openvswitch module is not accepting the correctly formated
netlink message for the TTL decrement action. For both setting and getting
the dec_ttl action, the actions should be nested in the
OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.
When the original patch was sent, it was tested with a private OVS userspace
implementation. This implementation was unfortunately not upstreamed and
reviewed, hence an erroneous version of this patch was sent out.
Leaving the patch as-is would cause problems as the kernel module could
interpret additional attributes as actions and vice-versa, due to the
actions not being encapsulated/nested within the actual attribute, but
being concatinated after it.
Fixes: 744676e777 ("openvswitch: add TTL decrement action")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a wrapper function to get the IMA hash of an inode. This helper
is useful in fingerprinting files (e.g executables on execution) and
using these fingerprints in detections like an executable unlinking
itself.
Since the ima_inode_hash can sleep, it's only allowed for sleepable
LSM hooks.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-3-kpsingh@chromium.org
Extend MRP to support LC mode(link check) for the interconnect port.
This applies only to the interconnect ring.
Opposite to RC mode(ring check) the LC mode is using CFM frames to
detect when the link goes up or down and based on that the userspace
will need to react.
One advantage of the LC mode over RC mode is that there will be fewer
frames in the normal rings. Because RC mode generates InTest on all
ports while LC mode sends CFM frame only on the interconnect port.
All 4 nodes part of the interconnect ring needs to have the same mode.
And it is not possible to have running LC and RC mode at the same time
on a node.
Whenever the MIM starts it needs to detect the status of the other 3
nodes in the interconnect ring so it would send a frame called
InLinkStatus, on which the clients needs to reply with their link
status.
This patch adds InLinkStatus frame type and extends existing rules on
how to forward this frame.
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.
Reload stats structure before this fix:
"stats": {
"reload": {
"driver_reinit": 2,
"fw_activate": 1,
"fw_activate_no_reset": 0
}
}
After this fix:
"stats": {
"reload": {
"driver_reinit": {
"unspecified": 2
},
"fw_activate": {
"unspecified": 1,
"no_reset": 0
}
}
Fixes: a254c26426 ("devlink: Add reload stats")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for data length code modifications for Classical CAN.
The netlink configuration interface always allowed to pass any value
that fits into a byte, therefore only the modification process had to be
extended to handle the raw DLC represenation of Classical CAN frames.
When a DLC value from 0 .. F is provided for Classical CAN frame
modifications the 'len' value is modified as-is with the exception that
potentially existing 9 .. F DLC values in the len8_dlc element are moved
to the 'len' element for the modification operation by mod_retrieve_ccdlc().
After the modification the Classical CAN frame DLC information is brought
back into the correct format by mod_store_ccdlc() which is filling 'len'
and 'len8_dlc' accordingly.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201119084921.2621-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
ISO 11898-1 Chapter 8.4.2.3 defines a 4 bit data length code (DLC) table which
maps the DLC to the payload length of the CAN frame in bytes:
DLC -> payload length
0 .. 8 -> 0 .. 8
9 .. 15 -> 8
Although the DLC values 8 .. 15 in Classical CAN always result in a payload
length of 8 bytes these DLC values are transparently transmitted on the CAN
bus. As the struct can_frame only provides a 'len' element (formerly 'can_dlc')
which contains the plain payload length ( 0 .. 8 ) of the CAN frame, the raw
DLC is not visible to the application programmer, e.g. for testing use-cases.
To access the raw DLC values 9 .. 15 the len8_dlc element is introduced, which
is only valid when the payload length 'len' is 8 and the DLC is greater than 8.
The len8_dlc element is filled by the CAN interface driver and used for CAN
frame creation by the CAN driver when the CAN_CTRLMODE_CC_LEN8_DLC flag is
supported by the driver and enabled via netlink configuration interface.
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201110101852.1973-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The helper uses CLOCK_MONOTONIC_COARSE source of time that is less
accurate but more performant.
We have a BPF CGROUP_SKB firewall that supports event logging through
bpf_perf_event_output(). Each event has a timestamp and currently we use
bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20
ns in time required for event logging.
bpf_ktime_get_ns():
EgressLogByRemoteEndpoint 113.82ns 8.79M
bpf_ktime_get_coarse_ns():
EgressLogByRemoteEndpoint 95.40ns 10.48M
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117184549.257280-1-me@ubique.spb.ru
The helper allows modification of certain bits on the linux_binprm
struct starting with the secureexec bit which can be updated using the
BPF_F_BPRM_SECUREEXEC flag.
secureexec can be set by the LSM for privilege gaining executions to set
the AT_SECURE auxv for glibc. When set, the dynamic linker disables the
use of certain environment variables (like LD_PRELOAD).
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117232929.2156341-1-kpsingh@chromium.org