Commit aaac3ba95e ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog->pages.
Unlike map->pages, prog->pages are still subject to change when we
need to expand the program through bpf_prog_realloc().
This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog->pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.
There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog->pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.
Fixes: aaac3ba95e ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert rightfully complained that 7bd509e311 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:
# ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
function old new delta
raw - 32832 +32832
[...]
Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
GTP tunneling fixes for net
The following patchset contains two GTP tunneling fixes for your net
tree, they are:
1) Offset to IPv4 header in gtp_check_src_ms_ipv4() is incorrect, thus
this function always succeeds and therefore this defeats this sanity
check. This allows packets that have no PDP to go though, patch from
Lionel Gauthier.
2) According to Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
Release 13, always set GTPv1 reserved bit to zero. This may cause
interoperability problems, patch from Harald Welte.
Please, apply, thanks a lot!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
gtp_check_src_ms_ipv4() did not find the PDP context matching with the
UE IP address because the memory location is not right, but the result
is inverted by the Boolean "not" operator. So whatever is the PDP
context, any call to this function is successful.
Signed-off-by: Lionel Gauthier <Lionel.Gauthier@eurecom.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend says:
====================
XDP for virtio_net
This implements virtio_net for the mergeable buffers and big_packet
modes. I tested this with vhost_net running on qemu and did not see
any issues. For testing num_buf > 1 I added a hack to vhost driver
to only but 100 bytes per buffer.
There are some restrictions for XDP to be enabled and work well
(see patch 3) for more details.
1. GUEST_TSO{4|6} must be off
2. MTU must be less than PAGE_SIZE
3. queues must be available to dedicate to XDP
4. num_bufs received in mergeable buffers must be 1
5. big_packet mode must have all data on single page
To test this I used pktgen in the hypervisor and ran the XDP sample
programs xdp1 and xdp2 from ./samples/bpf in the host. The default
mode that is used with these patches with Linux guest and QEMU/Linux
hypervisor is the mergeable buffers mode. I tested this mode for 2+
days running xdp2 without issues. Additionally I did a series of
driver unload/load tests to check the allocate/release paths.
To test the big_packets path I applied the following simple patch against
the virtio driver forcing big_packets mode,
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2242,7 +2242,7 @@ static int virtnet_probe(struct virtio_device *vdev)
vi->big_packets = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
- vi->mergeable_rx_bufs = true;
+ vi->mergeable_rx_bufs = false;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
I then repeated the tests with xdp1 and xdp2. After letting them run
for a few hours I called it good enough.
Testing the unexpected case where virtio receives a packet across
multiple buffers required patching the hypervisor vhost driver to
convince it to send these unexpected packets. Then I used ping with
the -s option to trigger the case with multiple buffers. This mode
is not expected to be used but as MST pointed out per spec it is
not strictly speaking illegal to generate multi-buffer packets so we
need someway to handle these. The following patch can be used to
generate multiple buffers,
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
*vq, u64
_iov = iov + ret;
size = node->size - addr + node->start;
- _iov->iov_len = min((u64)len - s, size);
+ printk("%s: build 100 length headers!\n", __func__);
+ _iov->iov_len = min((u64)len - s, (u64)100);//size);
_iov->iov_base = (void __user *)(unsigned long)
(node->userspace_addr + addr - node->start);
s += size;
The qemu command I most frequently used for testing (although I did test
various other combinations of devices) is the following,
./x86_64-softmmu/qemu-system-x86_64 \
-hda /var/lib/libvirt/images/Fedora-test0.img \
-m 4096 -enable-kvm -smp 2 \
-netdev tap,id=hn0,queues=4,vhost=on \
-device virtio-net-pci,netdev=hn0,mq=on,vectors=9,guest_tso4=off,guest_tso6=off \
-serial stdio
The options 'guest_tso4=off,guest_tso6=off' are required because we
do not support LRO with XDP at the moment.
Please review any comments/feedback welcome as always.
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_net XDP support expects receive buffers to be contiguous.
If this is not the case we enable a slowpath to allow connectivity
to continue but at a significan performance overhead associated with
linearizing data. To make it painfully aware to users that XDP is
running in a degraded mode we throw an xdp buffer error.
To linearize packets we allocate a page and copy the segments of
the data, including the header, into it. After this the page can be
handled by XDP code flow as normal.
Then depending on the return code the page is either freed or sent
to the XDP xmit path. There is no attempt to optimize this path.
This case is being handled simple as a precaution in case some
unknown backend were to generate packets in this form. To test this
I had to hack qemu and force it to generate these packets. I do not
expect this case to be generated by "real" backends.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.
Before sending the packet the header is zeroed. Also XDP is expected
to handle checksum correctly so no checksum offload support is
provided.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP requires using isolated transmit queues to avoid interference
with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
adds a XDP queue per cpu when a XDP program is loaded and does not
expose the queues to the OS via the normal API call to
netif_set_real_num_tx_queues(). This way the stack will never push
an skb to these queues.
However virtio/vhost/qemu implementation only allows for creating
TX/RX queue pairs at this time so creating only TX queues was not
possible. And because the associated RX queues are being created I
went ahead and exposed these to the stack and let the backend use
them. This creates more RX queues visible to the network stack than
TX queues which is worth mentioning but does not cause any issues as
far as I can tell.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds XDP support to virtio_net. Some requirements must be
met for XDP to be enabled depending on the mode. First it will
only be supported with LRO disabled so that data is not pushed
across multiple buffers. Second the MTU must be less than a page
size to avoid having to handle XDP across multiple pages.
If mergeable receive is enabled this patch only supports the case
where header and data are in the same buf which we can check when
a packet is received by looking at num_buf. If the num_buf is
greater than 1 and a XDP program is loaded the packet is dropped
and a warning is thrown. When any_header_sg is set this does not
happen and both header and data is put in a single buffer as expected
so we check this when XDP programs are loaded. Subsequent patches
will process the packet in a degraded mode to ensure connectivity
and correctness is not lost even if backend pushes packets into
multiple buffers.
If big packets mode is enabled and MTU/LRO conditions above are
met then XDP is allowed.
This patch was tested with qemu with vhost=on and vhost=off where
mergeable and big_packet modes were forced via hard coding feature
negotiation. Multiple buffers per packet was forced via a small
test patch to vhost.c in the vhost=on qemu mode.
Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.
This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.
Fixes: 1cceda7849 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 7fda702f93 ("sctp: use new rhlist interface on sctp transport
rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.
It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
as __sctp_lookup_association is called in rx path or sctp_lookup_association
which are in the protection of rcu_read_lock() already.
But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
usage' in __rhashtable_lookup.
This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
before calling sctp_epaddr_lookup_transport.
Fixes: 7fda702f93 ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur says:
====================
dpaa_eth: a couple of fixes
This patch set introduces big endian accessors in the dpaa_eth driver
making sure accesses to the QBMan HW are correct on little endian
platforms. Removing a redundant Kconfig dependency on FSL_SOC.
Adding myself as maintainer of the dpaa_eth driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add record for Freescale QORIQ DPAA Ethernet driver adding myself as
maintainer.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the struct miscdevice have many members, it is dangerous to init
it without members name relying only on member order.
This patch add member name to the init declaration.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>