The test creates two netns, ns1 and ns2. The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl. The test is configured in a way so
that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.
1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.
2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.
3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.
4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.
12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.
14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.
19) Revert a netns locking change that causes regressions, from Paul
Moore.
20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
...
Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).
Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.
Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.
Fixes: 65d472fb00 ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b8 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f7461734 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull kselftest updates from Shuah Khan:
"This update consists of:
- Fixes and improvements to existing tests
- Moving code from Documentation to selftests, samples, and tools:
* Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and
networking tests from Documentation to selftests.
* Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay,
and blackfin examples from Documentation to samples.
* Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
Documentation to tools.
* Deletes BUILD_DOCSRC and its dependencies"
* tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
selftests/futex: Check ANSI terminal color support
Doc: update 00-INDEX files to reflect the runnable code move
samples: move blackfin gptimers-example from Documentation
tools: move pcmcia crc32hash tool from Documentation
tools: move laptops dslm tool from Documentation
tools: move accounting tool from Documentation
samples: move auxdisplay example code from Documentation
samples: move watchdog example code from Documentation
samples: move timers example code from Documentation
samples: move misc-devices/mei example code from Documentation
samples: move mic/mpssd example code from Documentation
selftests: Move networking/timestamping from Documentation
selftests: move watchdog tests from Documentation/watchdog
selftests: move ia64 tests from Documentation/ia64
selftests: move vDSO tests from Documentation/vDSO
selftests: move ptp tests from Documentation/ptp
selftests: move prctl tests from Documentation/prctl
selftests: move dnotify_test from Documentation/filesystems
selftests/timers: Add missing error code assignment before test
selftests/zram: replace ZRAM_LZ4_COMPRESS
...
Move blackfin gptimers-example to samples and remove it from Documentation
Makefile. Update samples Kconfig and Makefile to build gptimers-example.
blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile.
Hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from
Makefile and lib/Kconfig.debug and updates VIDEO_PCI_SKELETON dependency
on BUILD_DOCSRC.
Documentation/Makefile is not deleted to avoid braking make htmldocs and
make distclean.
Acked-by: Michal Marek <mmarek@suse.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Pull rpmsg updates from Bjorn Andersson:
"The bulk of these patches involve splitting the rpmsg implementation
into a framework/API part and a virtio specific backend part. It then
adds the Qualcomm Shared Memory Device (SMD) as an additional
supported wire format.
Also included is a set of code style cleanups that have been lingering
for a while"
* tag 'rpmsg-v4.9' of git://github.com/andersson/remoteproc: (26 commits)
rpmsg: smd: fix dependency on QCOM_SMD=n
rpmsg: Introduce Qualcomm SMD backend
rpmsg: Allow callback to return errors
rpmsg: Move virtio specifics from public header
rpmsg: virtio: Hide vrp pointer from the public API
rpmsg: Hide rpmsg indirection tables
rpmsg: Split rpmsg core and virtio backend
rpmsg: Split off generic tail of create_channel()
rpmsg: Move helper for finding rpmsg devices to core
rpmsg: Move endpoint related interface to rpmsg core
rpmsg: Indirection table for rpmsg_endpoint operations
rpmsg: Move rpmsg_device API to new file
rpmsg: Introduce indirection table for rpmsg_device operations
rpmsg: Clean up rpmsg device vs channel naming
rpmsg: Make rpmsg_create_ept() take channel_info struct
rpmsg: rpmsg_send() operations takes rpmsg_endpoint
rpmsg: Name rpmsg devices based on channel id
rpmsg: Enable matching devices with drivers based on DT
rpmsg: Drop prototypes for non-existing functions
samples/rpmsg: add support for multiple instances
...
Suppose you have a map array value that is something like this
struct foo {
unsigned iter;
int array[SOME_CONSTANT];
};
You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact. This is because we have no way to verify we won't
go off the end of the array at verification time. This patch provides a start
for this work. We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code. Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region. So in practice, code such as
this
unsigned index = 0;
if (foo->iter >= SOME_CONSTANT)
foo->iter = index;
else
index = foo->iter++;
foo->array[index] = bar;
would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
seccomp_phase1() does not exist anymore. Instead, update sample to use
__seccomp_filter(). While at it, set max locked memory to unlimited.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These samples fail to compile as 'struct flow_keys' conflicts with
definition in net/flow_dissector.h. Fix the same by renaming the
structure used in the sample.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move auxdisplay examples to samples and remove it from Documentation
Makefile. Create a new Makefile to build auxdisplay. It can be built
from top level directory or from auxdisplay directory:
Run make -C samples/auxdisplay or cd samples/auxdisplay; make
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Move watchdog examples to samples and remove it from Documentation
Makefile. Create a new Makefile to build watchdog. It can be built
from top level directory or from watchdog directory:
Run make -C samples/watchdog or cd samples/watchdog; make
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Move timers examples to samples and remove it from Documentation
Makefile. Create a new Makefile to build timers. It can be built
from top level directory or from timers directory:
Run make -C samples/timers or cd samples/timers; make
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Move misc-devices/mei examples to samples/mei and remove it from
Documentation Makefile. Delete misc-devices/Makefile.
Create a new Makefile to build samples/mei. It can be built from top
level directory or from mei directory:
Run make -C samples/mei or cd samples/mei; make
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Add couple of test cases for direct write and the negative size issue, and
also adjust the direct packet access test4 since it asserts that writes are
not possible, but since we've just added support for writes, we need to
invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move mic/mpssd examples to samples and remove it from Documentation
Makefile. Create a new Makefile to build mic/mpssd. It can be built
from top level directory or from mic/mpssd directory:
Run make -C samples/mic/mpssd or cd samples/mic/mpssd; make
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
the test creates 3 namespaces with veth connected via bridge.
First two namespaces simulate two different hosts with the same
IPv4 and IPv6 addresses configured on the tunnel interface and they
communicate with outside world via standard tunnels.
Third namespace creates collect_md tunnel that is driven by BPF
program which selects different remote host (either first or
second namespace) based on tcp dest port number while tcp dst
ip is the same.
This scenario is rough approximation of load balancer use case.
The tests check both traditional tunnel configuration and collect_md mode.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
It tests both traditional tunnel configuration and
dynamic via bpf helpers.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some rpmsg backends support holding on to and redelivering messages upon
failed handling of them, so provide a way for the callback to report and
error and allow the backends to handle this.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The rpmsg device representing struct is called rpmsg_channel and the
variable name used throughout is rpdev, with the communication happening
on endpoints it's clearer to just call this a "device" in a public API.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The rpmsg_send() operations has been taking a rpmsg_device, but this
forces users of secondary rpmsg_endpoints to use the rpmsg_sendto()
interface - by extracting source and destination from the given data
structures. If we instead pass the rpmsg_endpoint to these functions a
service can use rpmsg_sendto() to respond to messages, even on secondary
endpoints.
In addition this would allow us to support operations on multiple
channels in future backends that does not support off-channel
operations.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.
For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.
After the fix:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
29: (bf) r1 = r8
30: (25) if r8 > 0x3c goto pc+47
R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
R9=pkt(id=0,off=0,r=54) R10=fp
31: (b7) r1 = 1
[...]
Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sample instruction pointer and frequency count in a BPF map
Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native
tunnel device is created in a namespace to interact with a lwtunnel
device out of the namespace, with metadata enabled. The bpf_skb_set_*
program is attached to tc egress and bpf_skb_get_* is attached to egress
qdisc. A ping between two tunnels is used to verify correctness and
the result of bpf_skb_get_* printed by bpf_trace_printk.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>