For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
then remove it from the nat_bysource_table via nat_extend->destroy.
But now, the nat extension is attached on demand, so if the nat extension
is not attached, we will not be notified when the ct is destroyed, i.e.
we may fail to remove ct from the nat_bysource_table.
So just keep it simple, even if the extension is not attached, we will
still invoke the related ext->destroy. And this will also preserve the
flexibility for the future extension.
Fixes: 9a08ecfe74 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
Third Round of IPVS Updates for v4.12
please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.
* Remove unused function
* Correct comparison of unsigned value
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
of 1160 bytes is larger than 1024 bytes
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).
This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.
For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_log_unregister() (which is what gets called in the logger backends
module exit paths) does a (required, module is removed) synchronize_rcu().
But nf_log_unset() is only called from pernet exit handlers. It doesn't
free any memory so there appears to be no need to call synchronize_rcu.
v2: Liping Zhang points out that nf_log_unregister() needs to be called
after pernet unregister, else rmmod would become unsafe.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
synchronize_net is expensive and slows down netns cleanup a lot.
We have two APIs to unregister a hook:
nf_unregister_net_hook (which calls synchronize_net())
and
nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)
Make nf_unregister_net_hook a wapper around new helper
__nf_unregister_net_hook, which unlinks the hook but does not free it.
Then, we can call that helper in nf_unregister_net_hooks and then
call synchronize_net() only once.
Andrey Konovalov reports this change improves syzkaller fuzzing speed at
least twice.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The sync_refresh_period variable is unsigned, so it can never be < 0.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
There are no in-tree callers of this function and it isn't exported.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
nowadays the NAT extension only stores the interface index
(used to purge connections that got masqueraded when interface goes down)
and pptp nat information.
Previous patches moved nf_ct_nat_ext_add to those places that need it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
make sure nat extension gets added if the master conntrack is subject to
NAT. This will be required once the nat core stops adding it by default.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently the nat extension is always attached as soon as nat module is
loaded. However, most NAT uses do not need the nat extension anymore.
Prepare to remove the add-nat-by-default by making those places that need
it attach it if its not present yet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
krealloc(NULL, ..) is same as kmalloc(), so we can avoid special-casing
the initial allocation after the prealloc removal (we had to use
->alloc_len as the initial allocation size).
This also means we do not zero the preallocated memory anymore; only
offsets[]. Existing code makes sure the new (used) extension space gets
zeroed out.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It was used by the nat extension, but since commit
7c96643519 ("netfilter: move nat hlist_head to nf_conn") its only needed
for connections that use MASQUERADE target or a nat helper.
Also it seems a lot easier to preallocate a fixed size instead.
With default settings, conntrack first adds ecache extension (sysctl
defaults to 1), so we get 40(ct extension header) + 24 (ecache) == 64 byte
on x86_64 for initial allocation.
Followup patches can constify the extension structs and avoid
the initial zeroing of the entire extension area.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Current SYNPROXY codes return NF_DROP during normal TCP handshaking,
it is not friendly to caller. Because the nf_hook_slow would treat
the NF_DROP as an error, and return -EPERM.
As a result, it may cause the top caller think it meets one error.
For example, the following codes are from cfv_rx_poll()
err = netif_receive_skb(skb);
if (unlikely(err)) {
++cfv->ndev->stats.rx_dropped;
} else {
++cfv->ndev->stats.rx_packets;
cfv->ndev->stats.rx_bytes += skb_len;
}
When SYNPROXY returns NF_DROP, then netif_receive_skb returns -EPERM.
As a result, the cfv driver would treat it as an error, and increase
the rx_dropped counter.
So use NF_STOLEN instead of NF_DROP now because there is no error
happened indeed, and free the skb directly.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similar to ip_register_table, pass nf_hook_ops to ebt_register_table().
This allows to handle hook registration also via pernet_ops and allows
us to avoid use of legacy register_hook api.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
looks like decnet isn't namespacified in first place, so restrict hook
registration to the initial namespace.
Prepares for eventual removal of legacy nf_register_hook() api.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_(un)register_hooks has to maintain an internal hook list to add/remove
those hooks from net namespaces as they are added/deleted.
ipvs already uses pernet_ops, so we can switch to the (more recent)
pernet hook api instead.
Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Defer registration of the synproxy hooks until the first SYNPROXY rule is
added. Also means we only register hooks in namespaces that need it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The commit ab8bc7ed86
("netfilter: remove nf_ct_is_untracked")
changed the line
if (ct && !nf_ct_is_untracked(ct) && nfct_nat(ct)) {
to
if (ct && nfct_nat(ct)) {
meanwhile, the commit 41390895e5
("netfilter: ipvs: don't check for presence of nat extension")
from ipvs-next had changed the same line to
if (ct && !nf_ct_is_untracked(ct) && (ct->status & IPS_NAT_MASK)) {
When ipvs-next got merged into nf-next, the merge resolution took
the first version, dropping the conversion of nfct_nat().
While this doesn't cause a problem at the moment, it will once we stop
adding the nat extension by default.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Only "cache" needs to use ulong (its used with set_bit()), missed can use
u16. Also add build-time assertion to ensure event bits fit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If insertion of a new conntrack fails because the table is full, the kernel
searches the next buckets of the hash slot where the new connection
was supposed to be inserted at for an entry that hasn't seen traffic
in reply direction (non-assured), if it finds one, that entry is
is dropped and the new connection entry is allocated.
Allow the conntrack gc worker to also remove *assured* conntracks if
resources are low.
Do this by querying the l4 tracker, e.g. tcp connections are now dropped
if they are no longer established (e.g. in finwait).
This could be refined further, e.g. by adding 'soft' established timeout
(i.e., a timeout that is only used once we get close to resource
exhaustion).
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit 223b02d923
("netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len")
had to increase size of the extension offsets because total size of the
extensions had increased to a point where u8 did overflow.
3 years later we've managed to diet extensions a bit and we no longer
need u16. Furthermore we can now add a compile-time assertion for this
problem.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
get rid of the (now unused) nf_ct_ext_add_length define and also
rename the function to plain nf_ct_ext_add().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>