Commit Graph

5979 Commits

Author SHA1 Message Date
Florian Westphal 743b83f15d netfilter: nft_socket: only do sk lookups when indev is available
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.

Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result.  Same is true for the forward chain, so restrict the use
to prerouting, input and output.

Use in output work if a socket is already attached to the skb.

Fixes: 554ced0a6e ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-28 16:15:23 +02:00
Volodymyr Mytnyk 626873c446 netfilter: conntrack: fix udp offload timeout sysctl
`nf_flowtable_udp_timeout` sysctl option is available only
if CONFIG_NFT_FLOW_OFFLOAD enabled. But infra for this flow
offload UDP timeout was added under CONFIG_NF_FLOW_TABLE
config option. So, if you have CONFIG_NFT_FLOW_OFFLOAD
disabled and CONFIG_NF_FLOW_TABLE enabled, the
`nf_flowtable_udp_timeout` is not present in sysfs.
Please note, that TCP flow offload timeout sysctl option
is present even CONFIG_NFT_FLOW_OFFLOAD is disabled.

I suppose it was a typo in commit that adds UDP flow offload
timeout and CONFIG_NF_FLOW_TABLE should be used instead.

Fixes: 975c57504d ("netfilter: conntrack: Introduce udp offload timeout configuration")
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-27 15:48:49 +02:00
Florian Westphal c7aab4f170 netfilter: nf_conntrack_tcp: re-init for syn packets only
Jaco Kroon reported tcp problems that Eric Dumazet and Neal Cardwell
pinpointed to nf_conntrack tcp_in_window() bug.

tcp trace shows following sequence:

I > R Flags [S], seq 3451342529, win 62580, options [.. tfo [|tcp]>
R > I Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [..]
R > I Flags [P.], seq 1:89, ack 1, [..]

Note 3rd ACK is from responder to initiator so following branch is taken:
    } else if (((state->state == TCP_CONNTRACK_SYN_SENT
               && dir == IP_CT_DIR_ORIGINAL)
               || (state->state == TCP_CONNTRACK_SYN_RECV
               && dir == IP_CT_DIR_REPLY))
               && after(end, sender->td_end)) {

... because state == TCP_CONNTRACK_SYN_RECV and dir is REPLY.
This causes the scaling factor to be reset to 0: window scale option
is only present in syn(ack) packets.  This in turn makes nf_conntrack
mark valid packets as out-of-window.

This was always broken, it exists even in original commit where
window tracking was added to ip_conntrack (nf_conntrack predecessor)
in 2.6.9-rc1 kernel.

Restrict to 'tcph->syn', just like the 3rd condtional added in
commit 82b72cb946 ("netfilter: conntrack: re-init state for retransmitted syn-ack").

Upon closer look, those conditionals/branches can be merged:

Because earlier checks prevent syn-ack from showing up in
original direction, the 'dir' checks in the conditional quoted above are
redundant, remove them. Return early for pure syn retransmitted in reply
direction (simultaneous open).

Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-27 15:48:45 +02:00
Pablo Neira Ayuso babc3dc952 netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion
This patch fixes spurious EEXIST errors.

Extend d2df92e98a ("netfilter: nft_set_rbtree: handle element
re-addition after deletion") to deal with elements with same end flags
in the same transation.

Reset the overlap flag as described by 7c84d41416 ("netfilter:
nft_set_rbtree: Detect partial overlaps on insertion").

Fixes: 7c84d41416 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Fixes: d2df92e98a ("netfilter: nft_set_rbtree: handle element re-addition after deletion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-22 15:49:15 +02:00
Pengcheng Yang eba1a872cb ipvs: correctly print the memory size of ip_vs_conn_tab
The memory size of ip_vs_conn_tab changed after we use hlist
instead of list.

Fixes: 731109e784 ("ipvs: use hlist instead of list")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-19 11:04:06 +02:00
Antoine Tenart 6c6f9f31ec netfilter: nf_tables: nft_parse_register can return a negative value
Since commit 6e1acfa387 ("netfilter: nf_tables: validate registers
coming from userspace.") nft_parse_register can return a negative value,
but the function prototype is still returning an unsigned int.

Fixes: 6e1acfa387 ("netfilter: nf_tables: validate registers coming from userspace.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-12 11:36:37 +02:00
Florian Westphal 05ae2fba82 netfilter: nft_socket: make cgroup match work in input too
cgroupv2 helper function ignores the already-looked up sk
and uses skb->sk instead.

Just pass sk from the calling function instead; this will
make cgroup matching work for udp and tcp in input even when
edemux did not set skb->sk already.

Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-11 12:12:46 +02:00
Vasily Averin 42193ffd79 netfilter: nf_tables: memcg accounting for dynamically allocated objects
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
use __GFP_ACCOUNT flag for objects that are dynamically
allocated from the packet path.

Such objects are allocated inside nft_expr_ops->init() callbacks
executed in task context while processing netlink messages.

In addition, this patch adds accounting to nft_set_elem_expr_clone()
used for the same purposes.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-05 11:55:46 +02:00
Jeremy Sowden 3181821317 netfilter: bitwise: fix reduce comparisons
The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should
compare the bitwise operation in `expr` with the tracked operation
associated with the destination register of `expr`.  However, instead of
being called on `expr` and `track->regs[priv->dreg].selector`,
`nft_expr_priv` is called on `expr` twice, so both reduce functions
return true even when the operations differ.

Fixes: be5650f8f4 ("netfilter: nft_bitwise: track register operations")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-29 23:33:46 +02:00
David S. Miller 2aa2f88c97 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Incorrect output device in nf_egress hook, from Phill Sutter.

2) Preserve liberal flag in TCP conntrack state, reported by Sven Auhagen.

3) Use GFP_KERNEL_ACCOUNT flag for nf_tables objects, from Vasily Averin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-28 08:57:10 -07:00
Vasily Averin 33758c8914 memcg: enable accounting for nft objects
nftables replaces iptables, but it lacks memcg accounting.

This patch account most of the memory allocation associated with nft
and should protect the host from misusing nft inside a memcg restricted
container.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-28 10:11:23 +02:00
Pablo Neira Ayuso f2dd495a8d netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
Do not reset IP_CT_TCP_FLAG_BE_LIBERAL flag in out-of-sync scenarios
coming before the TCP window tracking, otherwise such connections will
fail in the window check.

Update tcp_options() to leave this flag in place and add a new helper
function to reset the tcp window state.

Based on patch from Sven Auhagen.

Fixes: c4832c7bbc ("netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking")
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-24 15:17:18 +01:00
Jakub Kicinski 89695196f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in overtime fixes, no conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23 10:53:49 -07:00
Jakub Kicinski 0db8640df5 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2022-03-21 v2

We've added 137 non-merge commits during the last 17 day(s) which contain
a total of 143 files changed, 7123 insertions(+), 1092 deletions(-).

The main changes are:

1) Custom SEC() handling in libbpf, from Andrii.

2) subskeleton support, from Delyan.

3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao.

4) Fix net.core.bpf_jit_harden race, from Hou.

5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub.

6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami.
The arch specific bits will come later.

7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri.

8) Enable non-atomic allocations in local storage, from Joanne.

9) Various var_off ptr_to_btf_id fixed, from Kumar.

10) bpf_ima_file_hash helper, from Roberto.

11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits)
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  bpftool: Fix a bug in subskeleton code generation
  bpf: Fix bpf_prog_pack when PMU_SIZE is not defined
  bpf: Fix bpf_prog_pack for multi-node setup
  bpf: Fix warning for cast from restricted gfp_t in verifier
  bpf, arm: Fix various typos in comments
  libbpf: Close fd in bpf_object__reuse_map
  bpftool: Fix print error when show bpf map
  bpf: Fix kprobe_multi return probe backtrace
  Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"
  bpf: Simplify check in btf_parse_hdr()
  selftests/bpf/test_lirc_mode2.sh: Exit with proper code
  bpf: Check for NULL return from bpf_get_btf_vmlinux
  selftests/bpf: Test skipping stacktrace
  bpf: Adjust BPF stack helper functions to accommodate skip > 0
  bpf: Select proper size for bpf_prog_pack
  ...
====================

Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22 11:18:49 -07:00
Pablo Neira Ayuso 217cff36e8 netfilter: flowtable: pass flowtable to nf_flow_table_iterate()
The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:48 +01:00
Pablo Neira Ayuso bb321ed6bb netfilter: flowtable: remove redundant field in flow_offload_work struct
Already available through the flowtable object, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Florian Westphal e86dbdb9d4 netfilter: nft_exthdr: add reduce support
Check if we can elide the load. Cancel if the new candidate
isn't identical to previous store.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Florian Westphal 3c1eb413a4 netfilter: nft_fib: add reduce support
The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.

In most cases, this is expected to cancel tracking as re-use is
unlikely.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso 611580d2df netfilter: nft_tunnel: track register operations
Check if the destination register already contains the data that this
tunnel expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information. This patch does not perform bitwise tracking.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso 48f1910326 netfilter: nft_xfrm: track register operations
Check if the destination register already contains the data that this
xfrm expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso d77a721d21 netfilter: nft_socket: track register operations
Check if the destination register already contains the data that this
socket expression performs. This allows to skip this redundant
operation.  If the destination contains a different selector, update the
register tracking information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso 71ef842d73 netfilter: nft_immediate: cancel register tracking for data destination register
The immediate expression might clobber existing data on the registers,
cancel register tracking for the destination register.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso 5da03b5666 netfilter: nft_hash: track register operations
Check if the destination register already contains the data that this
osf expression performs. Always cancel register tracking for jhash since
this requires tracking multiple source registers in case of
concatenations. Perform register tracking (without bitwise) for symhash
since input does not come from source register.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso ffe6488e62 netfilter: nft_osf: track register operations
Allow to recycle the previous output of the OS fingerprint expression
if flags and ttl are the same.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00
Pablo Neira Ayuso 4e2b29d881 netfilter: nft_numgen: cancel register tracking
Random and increment are stateful, each invocation results in fresh output.
Cancel register tracking for these two expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:47 +01:00