Arraymap was not converted to use bpf_map_init_from_attr()
to avoid merge conflicts with emergency fixes. Do it now.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use the new callback to perform allocation checks for array maps.
The fd maps don't need a special allocation callback, they only
need a special check callback.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov says:
====================
BPF verifier has 700+ tests used to check correctness of the verifier.
Beyond checking the verifier log tell kernel to run accepted programs
as well via bpf_prog_test_run() command. That improves quality of the
tests and increases bpf test coverage.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
to improve test coverage make test_verifier run all successfully loaded
programs on 64-byte zero initialized data.
For clsbpf and xdp it means empty 64-byte packet.
For lwt and socket_filters it's 64-byte packet where skb->data
points after L2.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
in order to improve test coverage allow socket_filter program type
to be run via bpf_prog_test_run command.
Since such programs can be loaded by non-root tighten
permissions for bpf_prog_test_run to be root only
to avoid surprises.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Update tools/include/uapi/linux/bpf.h to bring it in sync with
include/uapi/linux/bpf.h. The listed commits forgot to update it.
Fixes: 02dd3291b2 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
Fixes: f19397a5c6 ("bpf: Add access to snd_cwnd and others in sock_ops")
Fixes: 06ef0ccb5a ("bpf/cgroup: fix a verification error for a CGROUP_DEVICE type prog")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is a static checker warning that "proglen" has an upper bound but
no lower bound. The allocation will just fail harmlessly so it's not a
big deal.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Doc BPF ld/ldx size defines as comments in code, as it makes in
faster to lookup in a programming/review setting, than looking up
the sizes in Documentation/networking/filter.txt.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, for bpf_trace_printk helper, fake ip address 0x1
is used with comments saying that fake ip will not be printed.
This is indeed true for 4.12 and earlier version, but for
4.13 and later version, the ip address will be printed if
it cannot be resolved with kallsym. Running samples/bpf/tracex5
program and you will have the following in the debugfs
trace_pipe output:
...
<...>-1819 [003] .... 443.497877: 0x00000001: mmap
<...>-1819 [003] .... 443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid)
...
The kernel commit changed this behavior is:
commit feaf1283d1
Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
Date: Thu Jun 22 17:04:55 2017 -0400
tracing: Show address when function names are not found
...
This patch changed the comment and also altered the fake ip
address to 0x0 as users may think 0x1 has some special meaning
while it doesn't. The new output:
...
<...>-1799 [002] .... 25.953576: 0: mmap
<...>-1799 [002] .... 25.953865: 0: read(fd=0, buf=00000000053936b5, size=512)
...
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Improve the 'unknown reason' comment, with an actual explaination of why
the ctx pkt-data pointers need to be loaded after the helper function
bpf_xdp_adjust_meta(). Based on the explaination Daniel gave.
Fixes: 36e04a2d78 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski says:
====================
Jiong says:
Currently bpftool could disassemble host jited image, for example x86_64,
using libbfd. However it couldn't disassemble offload jited image.
There are two reasons:
1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address
of jited image and image's length.
2. Even after issue 1 resolved, bpftool couldn't figure out what is the
offload arch from bpf_prog_info, therefore can't drive libbfd
disassembler correctly.
This patch set resolve issue 1 by introducing two new fields "jited_len"
and "jited_image" in bpf_dev_offload. These two fields serve as the generic
interface to communicate the jited image address and length for all offload
targets to higher level caller. For example, bpf_obj_get_info_by_fd could
use them to fill the userspace visible fields jited_prog_len and
jited_prog_insns.
This patch set resolve issue 2 by getting bfd backend name through
"ifindex", i.e network interface index.
v1:
- Deduct bfd arch name through ifindex, i.e network interface index.
First, map ifindex to devname through ifindex_to_name_ns, then get
pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei)
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The current architecture detection method in bpftool is designed for host
case.
For offload case, we can't use the architecture of "bpftool" itself.
Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME,
then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map
vendor id to bfd arch name which will finally be used to select bfd backend
for the disassembler.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
used by all host JIT targets to get jited image and it's length. While for
offload, targets are likely to have different offload mechanisms that these
info are kept in device private data fields.
Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
length and contents info for offload targets.
One way is to introduce new callback to parse device private data then fill
those fields in bpf_prog_info. This might be a little heavy, the other way
is to add generic fields which will be initialized by all offload targets.
This patch follow the second approach to introduce two new fields in
struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
correct jited_prog_len and jited_prog_insns in bpf_prog_info.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2018-01-16
this is a pull request for net-next/master consisting of 9 patches.
This is a series of patches, some of them initially by Franklin S Cooper
Jr, which was picked up by Faiz Abbas. Faiz Abbas added some patches
while working on this series, I contributed one as well.
The first two patches add support to CAN device infrastructure to limit
the bitrate of a CAN adapter if the used CAN-transceiver has a certain
maximum bitrate.
The remaining patches improve the m_can driver. They add support for
bitrate limiting to the driver, clean up the driver and add support for
runtime PM.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The trailing semicolon is an empty statement that does no operation.
It is completely stripped out by the compiler. Removing it since it doesn't do
anything.
Fixes: 5f35227ea3 ("net: Generalize ndo_gso_check to ndo_features_check")
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
restructure the code which adds support for configuring
PCIe VF via mgmt netdevice. which was added by
commit 7829451c69 ("cxgb4: Add control net_device for
configuring PCIe VF")
Original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
idr_find() is safe under rcu_read_lock() and
maybe_get_net() guarantees that net is alive.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count)
under net->nsid_lock does not guarantee, peer is alive:
rcu_read_lock()
peernet2id_alloc() ..
spin_lock_bh(&net->nsid_lock) ..
refcount_read(&peer->count) (!= 0) ..
.. put_net()
.. cleanup_net()
.. for_each_net(tmp)
.. spin_lock_bh(&tmp->nsid_lock)
.. __peernet2id(tmp, net) == -1
.. ..
.. ..
__peernet2id_alloc(alloc == true) ..
.. ..
rcu_read_unlock() ..
.. synchronize_rcu()
.. kmem_cache_free(net)
After the above situation, net::netns_id contains id pointing to freed memory,
and any other dereferencing by the id will operate with this freed memory.
Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
is generic interface, and better we fix it before someone really starts
use it in wrong context.
v2: Don't place refcount_read(&net->count) under net->nsid_lock
as suggested by Eric W. Biederman <ebiederm@xmission.com>
v3: Rebase on top of net-next
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang says:
====================
tun: allow to attach eBPF filter
This series tries to implement eBPF socket filter for tun. This could
be used for implementing efficient virtio-net receive filter for
vhost-net.
Changes from V2:
- fix typo
- remove unnecessary double check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows userspace to attach eBPF filter to tun. This will
allow to implement VM dataplane filtering in a more efficient way
compared to cBPF filter by allowing either qemu or libvirt to
attach eBPF filter to tun.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be reused by other eBPF program other than queue selection.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
net: sched: allow qdiscs to share filter block instances
Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.
So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress
Now if we list the qdiscs, we will see the block index in the output:
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1
To make is more visual, the situation looks like this:
ens7 ingress qdisc ens7 ingress qdisc
| |
| |
+----------> block 22 <----------+
Unlimited number of qdiscs may share the same block.
Note that this patchset introduces block sharing support also for clsact
qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24
We can add filter using the block index:
$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
Note we cannot use the qdisc for filter manipulations of shared blocks:
$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.
We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:
$ tc filter show block 22
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
---
v10->v11:
- patch 2:
- fixed error path when register_pernet_subsys fails pointed out by Cong
- patch 9:
- rebased on top of the current net-next
v9->v10:
- patch 7:
- fixed ifindex magic in the patch description
- userspace patches:
- added manpages and patch descriptions
v8->v9:
- patch "net: sched: add rt netlink message type for block get" was
removed, userspace check filter existence using qdisc dump
v7->v8:
- patch 7:
- added comment to ifindex block magic
- patch 9:
- new patch
- patch 10:
- base this on the patch that introduces qdisc-generic block index
attributes parsing/dumping
- patch 13:
- rebased on top of current net-next
v6->v7:
- patch 1:
- unsquashed shared block patch that was previously squashed by mistake
- fixed error path in block create - freeing chain 0
- patch 2:
- new patch - splitted from the previous one as it got accidentaly
squashed in the rebasing process in the past
- converted to idr extended
- removed auto-generating of block indexes. Callers have to explicily
tell that the block is shared by passing non-zero block index
- fixed error path in block get ext - freeing chain 0
- patch 7:
- changed extack message for block index handle as suggested by DaveA
- added extack message when block index does not exist
- the block ifindex magic is in define and change to 0xffffffff
as suggested by Jamal
- patch 8:
- new patch implementing RTM_GETBLOCK in order to query if the block
with some index exists
- patch 9:
- adjust to the core changes and check block index attributes for being 0
v5->v6:
- added patch 6 that introduces block handle
v4->v5:
- patch 5:
- add tracking of binding of devs that are unable to offload and check
that before block cbs call.
v3->v4:
- patch 1:
- rebased on top of the current net-next
- added some extack strings
- patch 3:
- rebased on top of the current net-next
- patch 5:
- propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
caller
- patch 7:
- rebased on top of the current net-next
v2->v3:
- removed original patch 1, removing tp->q cls_bpf dependency. Fixed by
Jakub in the meantime.
- patch 1:
- rebased on top of the current net-next
- patch 5:
- new patch
- patch 8:
- removed "p_" prefix from block index function args
- patch 10:
- add tc offload feature handling
====================
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from the prepared TC and in-driver ACL infrastructure and
introduce block sharing offload. For that, a new struct "block" is
introduced in spectrum_acl in order to hold a list of specific
block-port bindings.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>