In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).
In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.
On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.
This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.
Fixes: ec7470b834 ("netfilter: nf_tables: store transaction list locally while requesting module")
Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ip_set_get_ip_port() is only used in ip_set_bitmap_port.c. Move it
there and make it static.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
One inline function in ip_set_bitmap.h is only called in
ip_set_bitmap_ip.c: move it and remove inline function specifier.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ip_set_put_flags is rather large for a static inline function in a
header-file. Move it to ip_set_core.c and export it.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Several inline functions in ip_set.h are only called in ip_set_core.c:
move them and remove inline function specifier.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Most of the functions are only called from within ip_set_core.c.
The exception is ip_set_init_comment. However, this is too complex to
be a good candidate for a static inline function. Move it to
ip_set_core.c, change its linkage to extern and export it, leaving a
declaration in ip_set.h.
ip_set_comment_free is only used as an extension destructor, so change
its prototype to match and drop cast.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use a local variable to hold comment in order to align the arguments of
ip_set_comment_free properly.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
`struct nf_hook_ops`, `struct nf_hook_state` and the `nf_hookfn`
function typedef appear in function and struct declarations and
definitions in a number of netfilter headers. The structs and typedef
themselves are defined by linux/netfilter.h but only when
CONFIG_NETFILTER is enabled. Define them unconditionally and add
forward declarations in order to remove CONFIG_NETFILTER conditionals
from the other headers.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move some `struct nf_conntrack` code from linux/skbuff.h to
linux/nf_conntrack_common.h. Together with a couple of helpers for
getting and setting skb->_nfct, it allows us to remove
CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A few headers contain instances of:
#if defined(CONFIG_XXX) or defined(CONFIG_XXX_MODULE)
Replace them with:
#if IS_ENABLED(CONFIG_XXX)
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Three netfilter headers are only included once. Inline their contents
at those sites and remove them.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Several header-files, Kconfig files and Makefiles have trailing
white-space. Remove it.
In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT
correctly.
There are semicolons at the end of two function definitions in
include/net/netfilter/nf_conntrack_acct.h and
include/net/netfilter/nf_conntrack_ecache.h. Remove them.
Fix indentation in nf_conntrack_l4proto.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A number of non-UAPI Netfilter header-files contained superfluous
"#ifdef __KERNEL__" guards. Removed them.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
linux/netfilter.h defines a number of struct and inline function
definitions which are only available is CONFIG_NETFILTER is enabled.
These structs and functions are used in declarations and definitions in
other header-files. Added preprocessor checks to make sure these
headers will compile if CONFIG_NETFILTER is disabled.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A number of netfilter header-files used declarations and definitions
from other headers without including them. Added include directives to
make those declarations and definitions available.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
linux/netfilter/ipset/ip_set.h included four other header files:
include/linux/netfilter/ipset/ip_set_comment.h
include/linux/netfilter/ipset/ip_set_counter.h
include/linux/netfilter/ipset/ip_set_skbinfo.h
include/linux/netfilter/ipset/ip_set_timeout.h
Of these the first three were not included anywhere else. The last,
ip_set_timeout.h, was included in a couple of other places, but defined
inline functions which call other inline functions defined in ip_set.h,
so ip_set.h had to be included before it.
Inlined all four into ip_set.h, and updated the other files that
included ip_set_timeout.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.") the new
generic nf_conntrack was introduced, and it came to supersede the old
ip_conntrack.
This change updates (some) of the obsolete comments referring to old
file/function names of the ip_conntrack mechanism, as well as removes a
few self-referencing comments that we shouldn't maintain anymore.
I did not update any comments referring to historical actions (e.g,
comments like "this file was derived from ..." were left untouched, even
if the referenced file is no longer here).
Signed-off-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Resolve conflict between d2912cb15b ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer
and fe03d47456 ("Update my email address") which updates Jozsef
Kadlecsik's email.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>