On 32bit archs gcc complains due to cast from void* to u64.
Add intermediate casts to long to silence these warnings.
include/linux/netfilter/x_tables.h:376:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
include/linux/netfilter/x_tables.h:384:15: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:391:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:400:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Fixes: 71ae0dff02 ("netfilter: xtables: use percpu rule counters")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's force a 16 bytes alignment on xt_counter percpu allocations,
so that bytes and packets sit in same cache line.
xt_counter being exported to user space, we cannot add __align(16) on
the structure itself.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore :
Only one copy of table is kept, instead of one copy per cpu.
We also can avoid a dereference if we put table data right after
xt_table_info. It reduces register pressure and helps compiler.
Then, we attempt a kmalloc() if total size is under order-3 allocation,
to reduce TLB pressure, as in many cases, rules fit in 32 KB.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking
accordingly. Convert the comment extension into an rcu-avare object. Also,
simplify the timeout routines.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying of
the original hash table may happen from the resizing or listing functions.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.
Testsuite entry is added to verify the fix.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
We store the rule blob per (possible) cpu. Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.
Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.
On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The binary arp/ip/ip6tables ruleset is stored per cpu.
The only reason left as to why we need percpu duplication are the rule
counters embedded into ipt_entry et al -- since each cpu has its own copy
of the rules, all counters can be lockless.
The downside is that the more cpus are supported, the more memory is
required. Rules are not just duplicated per online cpu but for each
possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
not for the e.g. 64 cores present.
To save some memory and also improve utilization of shared caches it
would be preferable to only store the rule blob once.
So we first need to separate counters and the rule blob.
Instead of using entry->counters, allocate this percpu and store the
percpu address in entry->counters.pcnt on CONFIG_SMP.
This change makes no sense as-is; it is merely an intermediate step to
remove the percpu duplication of the rule set in a followup patch.
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, we have four xtables extensions that cannot be used from the
xt over nft compat layer. The problem is that they need real access to
the full blown xt_entry to validate that the rule comes with the right
dependencies. This check was introduced to overcome the lack of
sufficient userspace dependency validation in iptables.
To resolve this problem, this patch introduces a new field to the
xt_tgchk_param structure that tell us if the extension is run from
nft_compat context.
The three affected extensions are:
1) CLUSTERIP, this target has been superseded by xt_cluster. So just
bail out by returning -EINVAL.
2) TCPMSS. Relax the checking when used from nft_compat. If used with
the wrong configuration, it will corrupt !syn packets by adding TCP
MSS option.
3) ebt_stp. Relax the check to make sure it uses the reserved
destination MAC address for STP.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
IP addresses are often stored in netlink attributes. Add generic functions
to do that.
For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skbinfo extension provides mapping of metainformation with lookup in the ipset tables.
This patch defines the flags, the constants, the functions and the structures
for the data type independent support of the extension.
Note the firewall mark stores in the kernel structures as two 32bit values,
but transfered through netlink as one 64bit value.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Dan Carpenter reported that the static checker emits the warning
net/netfilter/ipset/ip_set_list_set.c:600 init_list_set()
warn: integer overflows 'sizeof(*map) + size * set->dsize'
Limit the maximal number of elements in list type of sets.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
nfacct objects already support accounting at the byte and packet
level. As such it is a natural extension to add the possiblity to
define a ceiling limit for both metrics.
All the support for quotas itself is added to nfnetlink acctounting
framework to stay coherent with current accounting object management.
Quota limit checks are implemented in xt_nfacct filter where
statistic collection is already done.
Pablo Neira Ayuso has also contributed to this feature.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Adds a new property for hash set types, where if a set is created
with the 'forceadd' option and the set becomes full the next addition
to the set may succeed and evict a random entry from the set.
To keep overhead low eviction is done very simply. It checks to see
which bucket the new entry would be added. If the bucket's pos value
is non-zero (meaning there's at least one entry in the bucket) it
replaces the first entry in the bucket. If pos is zero, then it continues
down the normal add process.
This property is useful if you have a set for 'ban' lists where it may
not matter if you release some entries from the set early.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Introduce packet mark support with new ip,mark hash set. This includes
userspace and kernelspace code, hash:ip,mark set tests and man page
updates.
The intended use of ip,mark set is similar to the ip:port type, but for
protocols which don't use a predictable port number. Instead of port
number it matches a firewall mark determined by a layer 7 filtering
program like opendpi.
As well as allowing or blocking traffic it will also be used for
accounting packets and bytes sent for each protocol.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Add a lockdep_nfnl_is_held() function and a nfnl_dereference() macro for
RCU dereferences protected by a NFNL subsystem mutex.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:
* Six patches to make the ipt_CLUSTERIP target support netnamespace,
from Gao feng.
* Two cleanups for the nf_conntrack_acct infrastructure, introducing
a new structure to encapsulate conntrack counters, from Holger
Eitzenberger.
* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.
* Skip checksum recalculation in SCTP support for IPVS, also from
Daniel Borkmann.
* Fix behavioural change in xt_socket after IP early demux, from
Florian Westphal.
* Fix bogus large memory allocation in the bitmap port set type in ipset,
from Jozsef Kadlecsik.
* Fix possible compilation issues in the hash netnet set type in ipset,
also from Jozsef Kadlecsik.
* Define constants to identify netlink callback data in ipset dumps,
again from Jozsef Kadlecsik.
* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
from Eric Dumazet.
* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.
* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
cleanup path, from Julian Anastasov.
* Save some cycles in ip6t_REJECT by skipping checksum validation in
packets leaving from our stack, from Stanislav Fomichev.
* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>