linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Pablo Neira Ayuso	269fc69533	netfilter: nfnetlink_hook: translate inet ingress to netdev The NFPROTO_INET pseudofamily is not exposed through this new netlink interface. The netlink dump either shows NFPROTO_IPV4 or NFPROTO_IPV6 for NFPROTO_INET prerouting/input/forward/output/postrouting hooks. The NFNLA_CHAIN_FAMILY attribute provides the family chain, which specifies if this hook applies to inet traffic only (either IPv4 or IPv6). Translate the inet/ingress hook to netdev/ingress to fully hide the NFPROTO_INET implementation details. Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:41 +02:00
Florian Westphal	4592ee7f52	netfilter: conntrack: remove offload_pickup sysctl again These two sysctls were added because the hardcoded defaults (2 minutes, tcp, 30 seconds, udp) turned out to be too low for some setups. They appeared in 5.14-rc1 so it should be fine to remove it again. Marcelo convinced me that there should be no difference between a flow that was offloaded vs. a flow that was not wrt. timeout handling. Thus the default is changed to those for TCP established and UDP stream, 5 days and 120 seconds, respectively. Marcelo also suggested to account for the timeout value used for the offloading, this avoids increase beyond the value in the conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls. Example: nf_conntrack_udp_timeout_stream=60 nf_flowtable_udp_timeout=60 This will remove offloaded udp flows after one minute, rather than two. An earlier version of this patch also cleared the ASSURED bit to allow nf_conntrack to evict the entry via early_drop (i.e., table full). However, it looks like we can safely assume that connection timed out via HW is still in established state, so this isn't needed. Quoting Oz: [..] the hardware sends all packets with a set FIN flags to sw. [..] Connections that are aged in hardware are expected to be in the established state. In case it turns out that back-to-sw-path transition can occur for 'dodgy' connections too (e.g., one side disappeared while software-path would have been in RETRANS timeout), we can adjust this later. Cc: Oz Shlomo <ozsh@nvidia.com> Cc: Paul Blakey <paulb@nvidia.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:41 +02:00
Pablo Neira Ayuso	69311e7c99	netfilter: nfnetlink_hook: Use same family as request message Use the same family as the request message, for consistency. The netlink payload provides sufficient information to describe the hook object, including the family. This makes it easier to userspace to correlate the hooks are that visited by the packets for a certain family. Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:41 +02:00
Pablo Neira Ayuso	3d9bbaf6c5	netfilter: nfnetlink_hook: use the sequence number of the request message The sequence number allows to correlate the netlink reply message (as part of the dump) with the original request message. The cb->seq field is internally used to detect an interference (update) of the hook list during the netlink dump, do not use it as sequence number in the netlink dump header. Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:40 +02:00
Pablo Neira Ayuso	a6e57c4af1	netfilter: nfnetlink_hook: missing chain family The family is relevant for pseudo-families like NFPROTO_INET otherwise the user needs to rely on the hook function name to differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names. Add nfnl_hook_chain_desc_attributes instead of using the existing NFTA_CHAIN_* attributes, since these do not provide a family number. Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:40 +02:00
Pablo Neira Ayuso	61e0c2bc55	netfilter: nfnetlink_hook: strip off module name from hookfn NFNLA_HOOK_FUNCTION_NAME should include the hook function name only, the module name is already provided by NFNLA_HOOK_MODULE_NAME. Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:40 +02:00
Florian Westphal	4608fdfc07	netfilter: conntrack: collect all entries in one cycle Michal Kubecek reports that conntrack gc is responsible for frequent wakeups (every 125ms) on idle systems. On busy systems, timed out entries are evicted during lookup. The gc worker is only needed to remove entries after system becomes idle after a busy period. To resolve this, always scan the entire table. If the scan is taking too long, reschedule so other work_structs can run and resume from next bucket. After a completed scan, wait for 2 minutes before the next cycle. Heuristics for faster re-schedule are removed. GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow tuning this as-needed or even turn the gc worker off. Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-06 17:07:35 +02:00
Jozsef Kadlecsik	5f7b51bf09	netfilter: ipset: Limit the maximal range of consecutive elements to add/delete The range size of consecutive elements were not limited. Thus one could define a huge range which may result soft lockup errors due to the long execution time. Now the range size is limited to 2^20 entries. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-08-04 10:41:03 +02:00
Arnd Bergmann	217e26bd87	netfilter: nfnl_hook: fix unused variable warning The only user of this variable is in an #ifdef: net/netfilter/nfnetlink_hook.c: In function 'nfnl_hook_entries_head': net/netfilter/nfnetlink_hook.c:177:28: error: unused variable 'netdev' [-Werror=unused-variable] Fixes: `e2cf17d377` ("netfilter: add new hook nfnl subsystem") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-23 14:45:03 +02:00
Pablo Neira Ayuso	a33f387ecd	netfilter: nft_nat: allow to specify layer 4 protocol NAT only nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified. Fixes: `d07db9884a` ("netfilter: nf_tables: introduce nft_validate_register_load()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-23 14:18:03 +02:00
Florian Westphal	30a56a2b88	netfilter: conntrack: adjust stop timestamp to real expiry value In case the entry is evicted via garbage collection there is delay between the timeout value and the eviction event. This adjusts the stop value based on how much time has passed. Fixes: `b87a2f9199` ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-23 14:18:03 +02:00
Pablo Neira Ayuso	32953df7a6	netfilter: nft_last: avoid possible false sharing Use the idiom described in: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance Moreover, prevent a compiler optimization. Fixes: `836382dc24` ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-23 14:18:02 +02:00
Pablo Neira Ayuso	32c3973d80	netfilter: flowtable: avoid possible false sharing The flowtable follows the same timeout approach as conntrack, use the same idiom as in `cc16921351` ("netfilter: conntrack: avoid same-timeout update") but also include the fix provided by `e37542ba11` ("netfilter: conntrack: avoid possible false sharing"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-23 14:18:01 +02:00
Dongliang Mu	cfbe3650dd	netfilter: nf_tables: fix audit memory leak in nf_tables_commit In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not free the adp variable. Fix this by adding nf_tables_commit_audit_free which frees the linked list with the head node adl. backtrace: kmalloc include/linux/slab.h:591 [inline] kzalloc include/linux/slab.h:721 [inline] nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline] nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508 nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:702 [inline] sock_sendmsg+0x56/0x80 net/socket.c:722 Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: kernel test robot <lkp@intel.com> Fixes: `c520292f29` ("audit: log nftables configuration change events once per table") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-17 02:25:18 +02:00
Pablo Neira Ayuso	d1b5b80da7	netfilter: nft_last: incorrect arithmetics when restoring last used Subtract the jiffies that have passed by to current jiffies to fix last used restoration. Fixes: `836382dc24` ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-06 14:15:13 +02:00
Pablo Neira Ayuso	6ac4bac4ce	netfilter: nft_last: honor NFTA_LAST_SET on restoration NFTA_LAST_SET tells us if this expression has ever seen a packet, do not ignore this attribute when restoring the ruleset. Fixes: `836382dc24` ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-06 14:15:13 +02:00
Manfred Spraul	cf4466ea47	netfilter: conntrack: Mark access for KCSAN KCSAN detected an data race with ipc/sem.c that is intentional. As nf_conntrack_lock() uses the same algorithm: Update nf_conntrack_core as well: nf_conntrack_lock() contains a1) spin_lock() a2) smp_load_acquire(nf_conntrack_locks_all). a1) actually accesses one lock from an array of locks. nf_conntrack_locks_all() contains b1) nf_conntrack_locks_all=true (normal write) b2) spin_lock() b3) spin_unlock() b2 and b3 are done for every lock. This guarantees that nf_conntrack_locks_all() prevents any concurrent nf_conntrack_lock() owners: If a thread past a1), then b2) will block until that thread releases the lock. If the threat is before a1, then b3)+a1) ensure the write b1) is visible, thus a2) is guaranteed to see the updated value. But: This is only the latest time when b1) becomes visible. It may also happen that b1) is visible an undefined amount of time before the b3). And thus KCSAN will notice a data race. In addition, the compiler might be too clever. Solution: Use WRITE_ONCE(). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-06 14:15:13 +02:00
Ali Abdallah	1da4cd82dd	netfilter: conntrack: add new sysctl to disable RST check This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking out of segments RSTs as INVALID. Signed-off-by: Ali Abdallah <aabdallah@suse.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-06 14:15:12 +02:00
Ali Abdallah	c4edc3ccbc	netfilter: conntrack: improve RST handling when tuple is re-used If we receive a SYN packet in original direction on an existing connection tracking entry, we let this SYN through because conntrack might be out-of-sync. Conntrack gets back in sync when server responds with SYN/ACK and state gets updated accordingly. However, if server replies with RST, this packet might be marked as INVALID because td_maxack value reflects the old conntrack state and not the state of the originator of the RST. Avoid td_maxack-based checks if previous packet was a SYN. Unfortunately that is not be enough: an out of order ACK in original direction updates last_index, so we still end up marking valid RST. Thus disable the sequence check when we are not in established state and the received RST has a sequence of 0. Because marking RSTs as invalid usually leads to unwanted timeouts, also skip RST sequence checks if a conntrack entry is already closing. Such entries can already be evicted via GC in case the table is full. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Ali Abdallah <aabdallah@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-06 14:15:12 +02:00
Vasily Averin	c23a9fd209	netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo Two patches listed below removed ctnetlink_dump_helpinfo call from under rcu_read_lock. Now its rcu_dereference generates following warning: ============================= WARNING: suspicious RCU usage 5.13.0+ #5 Not tainted ----------------------------- net/netfilter/nf_conntrack_netlink.c:221 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 stack backtrace: CPU: 1 PID: 2251 Comm: conntrack Not tainted 5.13.0+ #5 Call Trace: dump_stack+0x7f/0xa1 ctnetlink_dump_helpinfo+0x134/0x150 [nf_conntrack_netlink] ctnetlink_fill_info+0x2c2/0x390 [nf_conntrack_netlink] ctnetlink_dump_table+0x13f/0x370 [nf_conntrack_netlink] netlink_dump+0x10c/0x370 __netlink_dump_start+0x1a7/0x260 ctnetlink_get_conntrack+0x1e5/0x250 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x613/0x993 [nfnetlink] netlink_rcv_skb+0x50/0x100 nfnetlink_rcv+0x55/0x120 [nfnetlink] netlink_unicast+0x181/0x260 netlink_sendmsg+0x23f/0x460 sock_sendmsg+0x5b/0x60 __sys_sendto+0xf1/0x160 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `49ca022bcc` ("netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracks") Fixes: `0b35f6031a` ("netfilter: Remove duplicated rcu_read_lock.") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-02 02:29:20 +02:00
Vasily Averin	a23f89a999	netfilter: conntrack: nf_ct_gre_keymap_flush() removal nf_ct_gre_keymap_flush() is useless. It is called from nf_conntrack_cleanup_net_list() only and tries to remove nf_ct_gre_keymap entries from pernet gre keymap list. Though: a) at this point the list should already be empty, all its entries were deleted during the conntracks cleanup, because nf_conntrack_cleanup_net_list() executes nf_ct_iterate_cleanup(kill_all) before nf_conntrack_proto_pernet_fini(): nf_conntrack_cleanup_net_list +- nf_ct_iterate_cleanup \| nf_ct_put \| nf_conntrack_put \| nf_conntrack_destroy \| destroy_conntrack \| destroy_gre_conntrack \| nf_ct_gre_keymap_destroy `- nf_conntrack_proto_pernet_fini nf_ct_gre_keymap_flush b) Let's say we find that the keymap list is not empty. This means netns still has a conntrack associated with gre, in which case we should not free its memory, because this will lead to a double free and related crashes. However I doubt it could have gone unnoticed for years, obviously this does not happen in real life. So I think we can remove both nf_ct_gre_keymap_flush() and nf_conntrack_proto_pernet_fini(). Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-02 02:07:01 +02:00
Colin Ian King	4ca041f919	netfilter: nf_tables: Fix dereference of null pointer flow In the case where chain->flags & NFT_CHAIN_HW_OFFLOAD is false then nft_flow_rule_create is not called and flow is NULL. The subsequent error handling execution via label err_destroy_flow_rule will lead to a null pointer dereference on flow when calling nft_flow_rule_destroy. Since the error path to err_destroy_flow_rule has to cater for null and non-null flows, only call nft_flow_rule_destroy if flow is non-null to fix this issue. Addresses-Coverity: ("Explicity null dereference") Fixes: `3c5e446220` ("netfilter: nf_tables: memleak in hw offload abort path") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-02 02:05:59 +02:00
Florian Westphal	e15d4cdf27	netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state Consider: client -----> conntrack ---> Host client sends a SYN, but $Host is unreachable/silent. Client eventually gives up and the conntrack entry will time out. However, if the client is restarted with same addr/port pair, it may prevent the conntrack entry from timing out. This is noticeable when the existing conntrack entry has no NAT transformation or an outdated one and port reuse happens either on client or due to a NAT middlebox. This change prevents refresh of the timeout for SYN retransmits, so entry is going away after nf_conntrack_tcp_timeout_syn_sent seconds (default: 60). Entry will be re-created on next connection attempt, but then nat rules will be evaluated again. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-07-02 02:05:59 +02:00
Jakub Kicinski	b6df00789e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-06-29 15:45:27 -07:00
David S. Miller	a7b62112f0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Skip non-SCTP packets in the new SCTP chunk support for nft_exthdr, from Phil Sutter. 2) Simplify TCP option sanity check for TCP packets, also from Phil. 3) Add a new expression to store when the rule has been used last time. 4) Pass the hook state object to log function, from Florian Westphal. 5) Document the new sysctl knobs to tune the flowtable timeouts, from Oz Shlomo. 6) Fix snprintf error check in the new nfnetlink_hook infrastructure, from Dan Carpenter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-23 12:31:28 -07:00

1 2 3 4 5 ...

5748 Commits