Add a new 'devgroup' match to match on the device group of the
incoming and outgoing network device of a packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add a dummy ip_set_get_ip6_port function that unconditionally
returns false for CONFIG_IPV6=n and convert the real function
to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module
when loading ipset.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The patch adds the combined module of the "SET" target and "set" match
to netfilter. Both the previous and the current revisions are supported.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The module implements the list:set type support in two flavours:
without and with timeout. The sets has two sides: for the userspace,
they store the names of other (non list:set type of) sets: one can add,
delete and test set names. For the kernel, it forms an ordered union of
the member sets: the members sets are tried in order when elements are
added, deleted and tested and the process stops at the first success.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The module implements the hash:ip type support in four flavours:
for IPv4 or IPv6, both without and with timeout support.
All the hash types are based on the "array hash" or ahash structure
and functions as a good compromise between minimal memory footprint
and speed. The hashing uses arrays to resolve clashes. The hash table
is resized (doubled) when searching becomes too long. Resizing can be
triggered by userspace add commands only and those are serialized by
the nfnl mutex. During resizing the set is read-locked, so the only
possible concurrent operations are the kernel side readers. Those are
protected by RCU locking.
Because of the four flavours and the other hash types, the functions
are implemented in general forms in the ip_set_ahash.h header file
and the real functions are generated before compiling by macro expansion.
Thus the dereferencing of low-level functions and void pointer arguments
could be avoided: the low-level functions are inlined, the function
arguments are pointers of type-specific structures.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The module implements the bitmap:ip set type in two flavours, without
and with timeout support. In this kind of set one can store IPv4
addresses (or network addresses) from a given range.
In order not to waste memory, the timeout version does not rely on
the kernel timer for every element to be timed out but on garbage
collection. All set types use this mechanism.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The patch adds the IP set core support to the kernel.
The IP set core implements a netlink (nfnetlink) based protocol by which
one can create, destroy, flush, rename, swap, list, save, restore sets,
and add, delete, test elements from userspace. For simplicity (and backward
compatibilty and for not to force ip(6)tables to be linked with a netlink
library) reasons a small getsockopt-based protocol is also kept in order
to communicate with the ip(6)tables match and target.
The netlink protocol passes all u16, etc values in network order with
NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the
NLA_F_NESTED and NLA_F_NET_BYTEORDER flags.
For other kernel subsystems (netfilter match and target) the API contains
the functions to add, delete and test elements in sets and the required calls
to get/put refereces to the sets before those operations can be performed.
The set types (which are implemented in independent modules) are stored
in a simple RCU protected list. A set type may have variants: for example
without timeout or with timeout support, for IPv4 or for IPv6. The sets
(i.e. the pointers to the sets) are stored in an array. The sets are
identified by their index in the array, which makes possible easy and
fast swapping of sets. The array is protected indirectly by the nfnl
mutex from nfnetlink. The content of the sets are protected by the rwlock
of the set.
There are functional differences between the add/del/test functions
for the kernel and userspace:
- kernel add/del/test: works on the current packet (i.e. one element)
- kernel test: may trigger an "add" operation in order to fill
out unspecified parts of the element from the packet (like MAC address)
- userspace add/del: works on the netlink message and thus possibly
on multiple elements from the IPSET_ATTR_ADT container attribute.
- userspace add: may trigger resizing of a set
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The patch adds the NFNL_SUBSYS_IPSET id and NLA_PUT_NET* macros to the
vanilla kernel.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Resolve these warnings on `make headers_check`:
usr/include/linux/netfilter/xt_CT.h:7: found __[us]{8,16,32,64} type
without #include <linux/types.h>
...
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Accidentally missed removing the old out-of-union "inverse" member,
which caused the struct size to change which then gives size mismatch
warnings when using an old iptables.
It is interesting to see that gcc did not warn about this before.
(Filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47376 )
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
This adds destination address-based selection. The old "inverse"
member is overloaded (memory-wise) with a new "flags" variable,
similar to how J.Park did it with xt_string rev 1. Since revision 0
userspace only sets flag 0x1, no great changes are made to explicitly
test for different revisions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:
echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.
This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adding support for SNMP broadcast connection tracking. The SNMP
broadcast requests are now paired with the SNMP responses.
Thus allowing using SNMP broadcasts with firewall enabled.
Please refer to the following conversation:
http://marc.info/?l=netfilter-devel&m=125992205006600&w=2
Patrick McHardy wrote:
> > The best solution would be to add generic broadcast tracking, the
> > use of expectations for this is a bit of abuse.
> > The second best choice I guess would be to move the help() function
> > to a shared module and generalize it so it can be used for both.
This patch implements the "second best choice".
Since the netbios-ns conntrack module uses the same helper
functionality as the snmp, only one helper function is added
for both snmp and netbios-ns modules into the new object -
nf_conntrack_broadcast.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the
packet is dropped.
This adds a v2 target revision of xt_NFQUEUE that allows packets to
continue through the ruleset instead.
Because the actual queueing happens outside of the target context, the
'bypass' flag has to be communicated back to the netfilter core.
Unfortunately the only choice to do this without adding a new function
argument is to use the target function return value (i.e. the verdict).
In the NF_QUEUE case, the upper 16bit already contain the queue number
to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e.
we now have extra room for a new flag.
If a hook issued a NF_QUEUE verdict, then the netfilter core will
continue packet processing if the queueing hook
returns -ESRCH (== "this queue does not exist") and the new
NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value.
Note: If the queue exists, but userspace does not consume packets fast
enough, the skb will still be dropped.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds a new netfilter target which creates audit records
for packets traversing a certain chain.
It can be used to record packets which are rejected administraively
as follows:
-N AUDIT_DROP
-A AUDIT_DROP -j AUDIT --type DROP
-A AUDIT_DROP -j DROP
a rule which would typically drop or reject a packet would then
invoke the new chain to record packets before dropping them.
-j AUDIT_DROP
The module is protocol independant and works for iptables, ip6tables
and ebtables.
The following information is logged:
- netfilter hook
- packet length
- incomming/outgoing interface
- MAC src/dst/proto for ethernet packets
- src/dst/protocol address for IPv4/IPv6
- src/dst port for TCP/UDP/UDPLITE
- icmp type/code
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.
We eventually trigger NMI/RCU watchdog.
INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)
COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.
Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.
This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]
Time of iptables goes from 35 s to 150 ms.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new revision 3 that contains port ranges for all of origsrc,
origdst, replsrc and repldst. The high ports are appended to the
original v2 data structure to allow sharing most of the code with
v1 and v2. Use of the revision specific port matching function is
made dependant on par->match->revision.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since a string is stored, and not something like a MAC address that
would rely on (un)signedness, drop the qualifier.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>