Other concurrent running programs, like perf or the XDP program what
needed to be monitored, might take up part of the max locked memory
limit. Thus, the xdp_monitor tool have to set the RLIMIT_MEMLOCK to
RLIM_INFINITY, as it cannot determine a more sane limit.
Using the man exit(3) specified EXIT_FAILURE return exit code, and
correct other users too.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also monitor the tracepoint xdp_exception. This tracepoint is usually
invoked by the drivers. Programs themselves can activate this by
returning XDP_ABORTED, which will drop the packet but also trigger the
tracepoint. This is useful for distinguishing intentional (XDP_DROP)
vs. ebpf-program error cases that cased a drop (XDP_ABORTED).
Drivers also use this tracepoint for reporting on XDP actions that are
unknown to the specific driver. This can help the user to detect if a
driver e.g. doesn't implement XDP_REDIRECT yet.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first 8 bytes of the tracepoint context struct are not accessible
by the bpf code. This is a choice that dates back to the original
inclusion of this code.
See explaination in:
commit 98b5c2c65c ("perf, bpf: allow bpf programs attach to tracepoints")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
nfp: extend match and action for flower offload
Pieter says:
This series extends flower offload match and action capabilities. It
specifically adds offload capabilities for matching on MPLS, TTL, TOS
and flow label. Furthermore offload capabilities for action have been
expanded to include set ethernet, ipv4, ipv6, tcp and udp headers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
commit cc71b7b071 ("net/ipv6: remove unused err variable on
icmpv6_push_pending_frames") exposed icmpv6_push_pending_frames
return value not being used.
Remove now unnecessary int err declarations and uses.
Miscellanea:
o Remove unnecessary goto and out: labels
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0.
git bisect shows this variable has been around since linux has been in git in commit 1da177e4c3.
This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a014 (current HEAD as of this patch).
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Storing the left length of skb into 'len' actually has no effect
so we can remove it.
Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek says:
====================
libbpf: support more map options
The functional change to this series is the ability to use flags when
creating maps from object files loaded by libbpf. In order to do this,
the first patch updates the library to handle map definitions that
differ in size from libbpf's struct bpf_map_def.
For object files with a larger map definition, libbpf will continue to load
if the unknown fields are all zero, otherwise the map is rejected. If the
map definition in the object file is smaller than expected, libbpf will use
zero as a default value in the missing fields.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
which requires flags.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This library previously assumed a fixed-size map options structure.
Any new options were ignored. In order to allow the options structure
to grow and to support parsing older programs, this patch updates
the maps section parsing to handle varying sizes.
Object files with maps sections smaller than expected will have the new
fields initialized to zero. Object files which have larger than expected
maps sections will be rejected unless all of the unrecognized data is zero.
This change still assumes that each map definition in the maps section
is the same size.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function emac_isr is local to the source and does not need to
be in global scope, so make it static.
Cleans up sparse warnings:
symbol 'emac_isr' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
tcp: improving RACK cpu performance
This patch set improves the CPU consumption of the RACK TCP loss
recovery algorithm, in particular for high-speed networks. Currently,
for every ACK in recovery RACK can potentially iterate over all sent
packets in the write queue. On large BDP networks with non-trivial
losses the RACK write queue walk CPU usage becomes unreasonably high.
This patch introduces a new queue in TCP that keeps only skbs sent and
not yet (s)acked or marked lost, in time order instead of sequence
order. With that, RACK can examine this time-sorted list and only
check packets that were sent recently, within the reordering window,
per ACK. This is the fastest way without any write queue walks. The
number of skbs examined per ACK is reduced by orders of magnitude.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new time-ordered list to speed up RACK. The detection
logic is identical. But since the list is chronologically ordered
by skb_mstamp and contains only skbs not yet acked or sacked,
RACK can abort the loop upon hitting skbs that were sent more
recently. On YouTube servers this patch reduces the iterations on
write queue by 40x. The improvement is even bigger with large
BDP networks.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new queue (list) that tracks the sent but not yet
acked or SACKed skbs for a TCP connection. The list is chronologically
ordered by skb->skb_mstamp (the head is the oldest sent skb).
This list will be used to optimize TCP Rack recovery, which checks
an skb's timestamp to judge if it has been lost and needs to be
retransmitted. Since TCP write queue is ordered by sequence instead
of sent time, RACK has to scan over the write queue to catch all
eligible packets to detect lost retransmission, and iterates through
SACKed skbs repeatedly.
Special cares for rare events:
1. TCP repair fakes skb transmission so the send queue needs adjusted
2. SACK reneging would require re-inserting SACKed skbs into the
send queue. For now I believe it's not worth the complexity to
make RACK work perfectly on SACK reneging, so we do nothing here.
3. Fast Open: currently for non-TFO, send-queue correctly queues
the pure SYN packet. For TFO which queues a pure SYN and
then a data packet, send-queue only queues the data packet but
not the pure SYN due to the structure of TFO code. This is okay
because the SYN receiver would never respond with a SACK on a
missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK).
In order to not grow sk_buff, we use an union for the new list and
_skb_refdst/destructor fields. This is a bit complicated because
we need to make sure _skb_refdst and destructor are properly zeroed
before skb is cloned/copied at transmit, and before being freed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
variables are set based on the underlying device attributes.
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.
This was found with coccicheck M=net/ipv4/ on linus' tree.
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>