Commit Graph

4806 Commits

Author SHA1 Message Date
Taehee Yoo
b25a31bf0c netfilter: nf_tables: add missing ->release_ops() in error path of newrule()
->release_ops() callback releases resources and this is used in error path.
If nf_tables_newrule() fails after ->select_ops(), it should release
resources. but it can not call ->destroy() because that should be called
after ->init().
At this point, ->release_ops() should be used for releasing resources.

Test commands:
   modprobe -rv xt_tcpudp
   iptables-nft -I INPUT -m tcp   <-- error command
   lsmod

Result:
   Module                  Size  Used by
   xt_tcpudp              20480  2      <-- it should be 0

Fixes: b8e2040063 ("netfilter: nft_compat: use .release_ops and remove list of extension")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-20 08:32:58 +01:00
Pablo Neira Ayuso
74710e0590 netfilter: nft_redir: fix module autoload with ip4
AF_INET4 does not exist.

Fixes: c78efc99c7 ("netfilter: nf_tables: nat: merge nft_redir protocol specific modules)"
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18 16:22:49 +01:00
Pablo Neira Ayuso
8ffcd32f64 netfilter: nf_tables: bogus EBUSY in helper removal from transaction
Proper use counter updates when activating and deactivating the object,
otherwise, this hits bogus EBUSY error.

Fixes: cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Laura Garcia <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18 16:22:49 +01:00
Arnd Bergmann
d1fa381033 netfilter: fix NETFILTER_XT_TARGET_TEE dependencies
With NETFILTER_XT_TARGET_TEE=y and IP6_NF_IPTABLES=m, we get a link
error when referencing the NF_DUP_IPV6 module:

net/netfilter/xt_TEE.o: In function `tee_tg6':
xt_TEE.c:(.text+0x14): undefined reference to `nf_dup_ipv6'

The problem here is the 'select NF_DUP_IPV6 if IP6_NF_IPTABLES'
that forces NF_DUP_IPV6 to be =m as well rather than setting it
to =y as was intended here. Adding a soft dependency on
IP6_NF_IPTABLES avoids that broken configuration.

Fixes: 5d400a4933 ("netfilter: Kconfig: Change select IPv6 dependencies")
Cc: Máté Eckl <ecklm94@gmail.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Link: https://patchwork.ozlabs.org/patch/999498/
Link: https://lore.kernel.org/patchwork/patch/960062/
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18 16:21:54 +01:00
Pablo Neira Ayuso
05b7639da5 netfilter: nft_set_rbtree: check for inactive element after flag mismatch
Otherwise, we hit bogus ENOENT when removing elements.

Fixes: e701001e7c ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18 16:21:09 +01:00
Alin Nastac
29b0b5d565 netfilter: nf_conntrack_sip: remove direct dependency on IPv6
Previous implementation was not usable with CONFIG_IPV6=m.

Fixes: a3419ce335 ("netfilter: nf_conntrack_sip: add sip_external_media logic")
Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-18 16:11:54 +01:00
Florian Westphal
b8b2749865 netfilter: nf_tables: return immediately on empty commit
When running 'nft flush ruleset' while no rules exist, we will increment
the generation counter and announce a new genid to userspace, yet
nothing had changed in the first place.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11 20:01:20 +01:00
Pablo Neira Ayuso
3f3a390dbd netfilter: nf_tables: use-after-free in dynamic operations
Smatch reports:

       net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy()
        error: dereferencing freed memory 'expr->ops'

net/netfilter/nf_tables_api.c
    2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
    2163                                   struct nft_expr *expr)
    2164 {
    2165        if (expr->ops->destroy)
    2166                expr->ops->destroy(ctx, expr);
                                                ^^^^
--> 2167        module_put(expr->ops->type->owner);
                           ^^^^^^^^^
    2168 }

Smatch says there are three functions which free expr->ops.

Fixes: b8e2040063 ("netfilter: nft_compat: use .release_ops and remove list of extension")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11 13:19:49 +01:00
Pablo Neira Ayuso
273fe3f100 netfilter: nf_tables: bogus EBUSY when deleting set after flush
Set deletion after flush coming in the same batch results in EBUSY. Add
set use counter to track the number of references to this set from
rules. We cannot rely on the list of bindings for this since such list
is still populated from the preparation phase.

Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11 13:19:24 +01:00
Pablo Neira Ayuso
40ba1d9b4d netfilter: nf_tables: fix set double-free in abort path
The abort path can cause a double-free of an anonymous set.
Added-and-to-be-aborted rule looks like this:

udp dport { 137, 138 } drop

The to-be-aborted transaction list looks like this:

newset
newsetelem
newsetelem
rule

This gets walked in reverse order, so first pass disables the rule, the
set elements, then the set.

After synchronize_rcu(), we then destroy those in same order: rule, set
element, set element, newset.

Problem is that the anonymous set has already been bound to the rule, so
the rule (lookup expression destructor) already frees the set, when then
cause use-after-free when trying to delete the elements from this set,
then try to free the set again when handling the newset expression.

Rule releases the bound set in first place from the abort path, this
causes the use-after-free on set element removal when undoing the new
element transactions. To handle this, skip new element transaction if
set is bound from the abort path.

This is still causes the use-after-free on set element removal.  To
handle this, remove transaction from the list when the set is already
bound.

Joint work with Florian Westphal.

Fixes: f6ac858589 ("netfilter: nf_tables: unbind set in rule from commit path")
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-08 16:41:18 +01:00
Florian Westphal
46f7487e16 netfilter: nat: don't register device notifier twice
Otherwise, we get notifier list corruption.

This is the most simple fix: remove the device notifier call chain
from the ipv6 masquerade register function and handle it only
in the ipv4 version.

The better fix is merge
nf_nat_masquerade_ipv4/6_(un)register_notifier
  into a single
nf_nat_masquerade_(un)register_notifiers

but to do this its needed to first merge the two masquerade modules
into a single xt_MASQUERADE.

Furthermore, we need to use different refcounts for ipv4/ipv6
until we can merge MASQUERADE.

Fixes: d1aca8ab31 ("netfilter: nat: merge ipv4 and ipv6 masquerade functionality")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-08 16:41:09 +01:00
Florian Westphal
db8ab38880 netfilter: nf_tables: merge ipv4 and ipv6 nat chain types
Merge the ipv4 and ipv6 nat chain type. This is the last
missing piece which allows to provide inet family support
for nat in a follow patch.

The kconfig knobs for ipv4/ipv6 nat chain are removed, the
nat chain type will be built unconditionally if NFT_NAT
expression is enabled.

Before:
   text	   data	    bss	    dec	    hex	filename
   1576     896       0    2472     9a8 nft_chain_nat_ipv4.ko
   1697     896       0    2593     a21 nft_chain_nat_ipv6.ko

After:
   text	   data	    bss	    dec	    hex	filename
   1832     896       0    2728     aa8 nft_chain_nat.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:59 +01:00
Florian Westphal
a9ce849e78 netfilter: nf_tables: nat: merge nft_masq protocol specific modules
The family specific masq modules are way too small to warrant
an extra module, just place all of them in nft_masq.

before:
  text	   data	    bss	    dec	    hex	filename
   1001	    832	      0	   1833	    729	nft_masq.ko
    766	    896	      0	   1662	    67e	nft_masq_ipv4.ko
    764	    896	      0	   1660	    67c	nft_masq_ipv6.ko

after:
   2010	    960	      0	   2970	    b9a	nft_masq.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:59 +01:00
Florian Westphal
c78efc99c7 netfilter: nf_tables: nat: merge nft_redir protocol specific modules
before:
 text	   data	    bss	    dec	    hex	filename
 990	    832	      0	   1822	    71e nft_redir.ko
 697	    896	      0	   1593	    639 nft_redir_ipv4.ko
 713	    896	      0	   1609	    649	nft_redir_ipv6.ko

after:
 text	   data	    bss	    dec	    hex	filename
 1910	    960	      0	   2870	    b36	nft_redir.ko

size is reduced, all helpers from nft_redir.ko can be made static.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:58 +01:00
Sami Tolvanen
20fdaf6e1e netfilter: xt_IDLETIMER: fix sysfs callback function type
Use struct device_attribute instead of struct idletimer_tg_attr, and
the correct callback function type to avoid indirect call mismatches
with Control Flow Integrity checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:57 +01:00
Li RongQing
2e7b162c5e netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2
CONNTRACK_LOCKS is divisor when computer array index, if it is power of
2, compiler will optimize modulo operation as bitwise AND, or else
modulo will lower performance.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:46 +01:00
Li RongQing
a9f5e78c40 netfilter: nf_tables: check the result of dereferencing base_chain->stats
Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.

base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.

And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.

And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.

Thanks for Eric's help to finish this patch.

Fixes: 009240940e ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:34:24 +01:00
Xin Long
f52a40fb41 ipvs: get sctphdr by sctphoff in sctp_csum_check
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls
skb_make_writable() to ensure sctphdr to be linearized.

So there's no need to get sctphdr by calling skb_header_pointer()
in sctp_csum_check().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:28:44 +01:00
Li RongQing
11d4dd0b20 netfilter: convert the proto argument from u8 to u16
The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8

if a etable's match/target has proto that is u16, will cause
the check failure.

and convert be16 to short in bridge/netfilter/ebtables.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:28:43 +01:00
wenxu
3e511d5652 netfilter: nft_tunnel: Add dst_cache support
The metadata_dst does not initialize the dst_cache field, this causes
problems to ip_md_tunnel_xmit() since it cannot use this cache, hence,
Triggering a route lookup for every packet.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:25:06 +01:00
Florian Westphal
be0502a3f2 netfilter: conntrack: tcp: only close if RST matches exact sequence
TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:19:31 +01:00
Andrea Claudi
f25a9b8515 ipvs: change some data types from int to bool
Change the data type of the following variables from int to bool
across ipvs code:

  - found
  - loop
  - need_full_dest
  - need_full_svc
  - payload_csum

Also change the following functions to use bool full_entry param
instead of int:

  - ip_vs_genl_parse_dest()
  - ip_vs_genl_parse_service()

This patch does not change any functionality but makes the source
code slightly easier to read.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:19:04 +01:00
Pablo Neira Ayuso
123f89c8aa netfilter: nft_set_hash: remove nft_hash_key()
hashtable is never used for 2-byte keys, remove nft_hash_key().

Fixes: e240cd0df4 ("netfilter: nf_tables: place all set backends in one single module")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:32 +01:00
Pablo Neira Ayuso
a01cbae57e netfilter: nft_set_hash: bogus element self comparison from deactivation path
Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.

Fixes: 6c03ae210c ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:31 +01:00
Pablo Neira Ayuso
3b02b0adc2 netfilter: nft_set_hash: fix lookups with fixed size hash on big endian
Call jhash_1word() for the 4-bytes key case from the insertion and
deactivation path, otherwise big endian arch set lookups fail.

Fixes: 446a8268b7 ("netfilter: nft_set_hash: add lookup variant for fixed size hashtable")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:31 +01:00