Add a new devlink port attribute that indicates the port's number of lanes.
Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.
The attribute is not passed to user space in case the number of lanes is
invalid (0).
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
exposes basic inet socket attribute, plus some MPTCP socket
fields comprising PM status and MPTCP-level sequence numbers.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit bf9765145b ("sock: Make sk_protocol a 16-bit value")
the current size of 'sdiag_protocol' is not sufficient to represent
the possible protocol values.
This change introduces a new inet diag request attribute to let
user space specify the relevant protocol number using u32 values.
The attribute is parsed by inet diag core on get/dump command
and the extended protocol value, if available, is preferred to
'sdiag_protocol' to lookup the diag handler.
The parse attributed are exposed to all the diag handlers via
the cb->data.
Note that inet_diag_dump_one_icsk() is left unmodified, as it
will not be used by protocol using the extended attribute.
Suggested-by: David S. Miller <davem@davemloft.net>
Co-developed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:
static void test_cq_overflow(struct io_uring *ring)
{
struct io_uring_cqe *cqe;
struct io_uring_sqe *sqe;
int issued = 0;
int ret = 0;
do {
sqe = io_uring_get_sqe(ring);
if (!sqe) {
fprintf(stderr, "get sqe failed\n");
break;;
}
ret = io_uring_submit(ring);
if (ret <= 0) {
if (ret != -EBUSY)
fprintf(stderr, "sqe submit failed: %d\n", ret);
break;
}
issued++;
} while (ret > 0);
assert(ret == -EBUSY);
printf("issued requests: %d\n", issued);
while (issued) {
ret = io_uring_peek_cqe(ring, &cqe);
if (ret) {
if (ret != -EAGAIN) {
fprintf(stderr, "peek completion failed: %s\n",
strerror(ret));
break;
}
printf("left requets: %d\n", issued);
continue;
}
io_uring_cqe_seen(ring, cqe);
issued--;
printf("left requets: %d\n", issued);
}
}
int main(int argc, char *argv[])
{
int ret;
struct io_uring ring;
ret = io_uring_queue_init(16, &ring, 0);
if (ret) {
fprintf(stderr, "ring setup failed: %d\n", ret);
return 1;
}
test_cq_overflow(&ring);
return 0;
}
To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Define 100G, 200G and 400G link modes using 100Gbps per lane
LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More often than not, a failed VM-entry in an x86 production
environment is induced by a defective CPU. To help identify the bad
hardware, include the id of the last logical CPU to run a vCPU in the
information provided to userspace on a KVM exit for failed VM-entry or
for KVM internal errors not associated with emulation. The presence of
this additional information is indicated by a new capability,
KVM_CAP_LAST_CPU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-5-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for rejecting packets from the prerouting chain, from
Laura Garcia Liebana.
2) Remove useless assignment in pipapo, from Stefano Brivio.
3) On demand hook registration in IPVS, from Julian Anastasov.
4) Expire IPVS connection from process context to not overload
timers, also from Julian.
5) Fallback to conntrack TCP tracker to handle connection reuse
in IPVS, from Julian Anastasov.
6) Several patches to support for chain bindings.
7) Expose enum nft_chain_flags through UAPI.
8) Reject unsupported chain flags from the netlink control plane.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In the zoned storage model, the sectors within a zone are typically all
writeable. With the introduction of the Zoned Namespace (ZNS) Command
Set in the NVM Express organization, the model was extended to have a
specific writeable capacity.
Extend the zone descriptor data structure with a zone capacity field to
indicate to the user how many sectors in a zone are writeable.
Introduce backward compatibility in the zone report ioctl by extending
the zone report header data structure with a flags field to indicate if
the capacity field is available.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Sometimes it's handy to know when the socket gets freed. In
particular, we'd like to try to use a smarter allocation of
ports for bpf_bind and explore the possibility of limiting
the number of SOCK_DGRAM sockets the process can have.
Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers on
inet socket release. It triggers only for userspace sockets
(not in-kernel ones) and therefore has the same semantics as
the existing BPF_CGROUP_INET_SOCK_CREATE.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200706230128.4073544-2-sdf@google.com
Initially the thermal framework had a very simple notification
mechanism to send generic netlink messages to the userspace.
The notification function was never called from anywhere and the
corresponding dead code was removed. It was probably a first attempt
to introduce the netlink notification.
At LPC2018, the presentation "Linux thermal: User kernel interface",
proposed to create the notifications to the userspace via a kfifo.
The advantage of the kfifo is the performance. It is usually used from
a 1:1 communication channel where a driver captures data and sends it
as fast as possible to a userspace process.
The drawback is that only one process uses the notification channel
exclusively, thus no other process is allowed to use the channel to
get temperature or notifications.
This patch defines a generic netlink API to discover the current
thermal setup and adds event notifications as well as temperature
sampling. As any genetlink protocol, it can evolve and the versioning
allows to keep the backward compatibility.
In order to prevent the user from getting flooded with data on a
single channel, there are two multicast channels, one for the
temperature sampling when the thermal zone is updated and another one
for the events, so the user can get the events only without the
thermal zone temperature sampling.
Also, a list of commands to discover the thermal setup is added and
can be extended when needed.
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20200706105538.2159-3-daniel.lezcano@linaro.org
AFU (Accelerated Function Unit) is dynamic region of the DFL based FPGA,
and always defined by users. Some DFL based FPGA cards allow users to
implement their own interrupts in AFU. In order to support this,
hardware implements a new UINT (AFU Interrupt) private feature with
related capability register which describes the number of supported
AFU interrupts as well as the local index of the interrupts for
software enumeration, and from software side, driver follows the common
DFL interrupt notification and handling mechanism, and it implements
two ioctls below for user to query number of irqs supported and set/unset
interrupt triggers.
Ioctls:
* DFL_FPGA_PORT_UINT_GET_IRQ_NUM
get the number of irqs, which is used to determine how many interrupts
UINT feature supports.
* DFL_FPGA_PORT_UINT_SET_IRQ
set/unset eventfds as AFU interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.
This patch adds interrupt support for fme global error reporting sub
feature. It follows the common DFL interrupt notification and handling
mechanism. And it implements two ioctls below for user to query
number of irqs supported, and set/unset interrupt triggers.
Ioctls:
* DFL_FPGA_FME_ERR_GET_IRQ_NUM
get the number of irqs, which is used to determine whether/how many
interrupts fme error reporting feature supports.
* DFL_FPGA_FME_ERR_SET_IRQ
set/unset given eventfds as fme error reporting interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Error reporting interrupt is very useful to notify users that some
errors are detected by the hardware. Once users are notified, they
could query hardware logged error states, no need to continuously
poll on these states.
This patch adds interrupt support for port error reporting sub feature.
It follows the common DFL interrupt notification and handling mechanism,
implements two ioctl commands below for user to query number of irqs
supported, and set/unset interrupt triggers.
Ioctls:
* DFL_FPGA_PORT_ERR_GET_IRQ_NUM
get the number of irqs, which is used to determine whether/how many
interrupts error reporting feature supports.
* DFL_FPGA_PORT_ERR_SET_IRQ
set/unset given eventfds as error interrupt triggers.
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
In cases such as DRM_FORMAT_MOD_SAMSUNG_16_16_TILE, the modifier
describes a generic pixel re-ordering which can be applicable to
multiple vendors.
Define an alias: DRM_FORMAT_MOD_GENERIC_16_16_TILE, which can be
used to describe this layout in a vendor-neutral way, and add a
comment about the expected usage of such "generic" modifiers.
Changes in v2:
- Move note about future cases to comment (Daniel)
Signed-off-by: Brian Starkey <brian.starkey@arm.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200626164800.11595-1-brian.starkey@arm.com
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-07-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).
The main changes are:
1) bpftool ability to show PIDs of processes having open file descriptors
for BPF map/program/link/BTF objects, relying on BPF iterator progs
to extract this info efficiently, from Andrii Nakryiko.
2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
seq_files, from Yonghong Song.
3) Support access to BPF map fields in struct bpf_map from programs
through BTF struct access, from Andrey Ignatov.
4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
via seq_file from BPF iterator progs, from Song Liu.
5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
helper, from Dmitry Yakunin.
6) Optimize BPF sk_storage selection of its caching index, from Martin
KaFai Lau.
7) Removal of redundant synchronize_rcu()s from BPF map destruction which
has been a historic leftover, from Alexei Starovoitov.
8) Several improvements to test_progs to make it easier to create a shell
loop that invokes each test individually which is useful for some CIs,
from Jesper Dangaard Brouer.
9) Fix bpftool prog dump segfault when compiled without skeleton code on
older clang versions, from John Fastabend.
10) Bunch of cleanups and minor improvements, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This new chain flag specifies that:
* the kernel dynamically allocates the chain name, if no chain name
is specified.
* If the immediate expression that refers to this chain is removed,
then this bound chain (and its content) is destroyed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>