Add the corresponding trace if we have a full match in a non-terminal
rule. Note that the traces will look slightly different than in
x_tables since the log message after all expressions have been
evaluated (contrary to x_tables, that emits it before the target
action). This manifests in two differences in nf_tables wrt. x_tables:
1) The rule that enables the tracing is included in the trace.
2) If the rule emits some log message, that is shown before the
trace log message.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Display "return" for implicit rule at the end of a non-base chain,
instead of when popping chain from the stack.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After returning from the chain that we just went to with no matchings,
we get a bogus rule number in the trace. To fix this, we would need
to iterate over the list of remaining rules in the chain to update the
rule number counter.
Patrick suggested to set this to the maximum value since the default
base chain policy is the very last action when the processing the base
chain is over.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes a crash when trying to access the counters and the
default chain policy from the non-base chain that we have reached
via the goto chain. Fix this by falling back on the original base
chain after returning from the custom chain.
While fixing this, kill the inline function to account chain statistics
to improve source code readability.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit 0eba801b64 tried to fix a race
where nat initialisation can happen after ctnetlink-created conntrack
has been created.
However, it causes the nat module(s) to be loaded needlessly on
systems that are not using NAT.
Fortunately, we do not have to create null bindings in that case.
conntracks injected via ctnetlink always have the CONFIRMED bit set,
which prevents addition of the nat extension in nf_nat_ipv4/6_fn().
We only need to make sure that either no nat extension is added
or that we've created both src and dst manips.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’:
net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable]
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.
To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.
This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.
The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The intended format in request_module is %.*s instead of %*.s.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, nf_tables trims off the set name if it exceeeds 15
bytes, so explicitly reject set names that are too large.
Reported-by: Giuseppe Longo <giuseppelng@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are no these aliases, so kernel can not request appropriate
match table:
$ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP
iptables: No chain/target/match by that name.
setsockopt() requests ipt_osf module, which is not present. Add
the aliases.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This simple modification allows iptables to work with INPUT chain
in combination with cgroup module. It could be useful for counting
ingress traffic per cgroup with nfacct netfilter module. There
were no problems to count the egress traffic that way formerly.
It's possible to get classified sk_buff after PREROUTING, due to
socket lookup being done in early_demux (tcp_v4_early_demux). Also
it works for udp as well.
Trivial usage example, assuming we're in the same shell every step
and we have enough permissions:
1) Classic net_cls cgroup initialization:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
2) Set up cgroup for interesting application:
mkdir /sys/fs/cgroup/net_cls/wget
echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid
echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs
3) Create kernel counters:
nfacct add wget-cgroup-in
iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in
nfacct add wget-cgroup-out
iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out
4) Network usage:
wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz
5) Check results:
nfacct list
Cgroup approach is being used for the DataUsage (counting & blocking
traffic) feature for Samsung's modification of the Tizen OS.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric points out that the locks can be global.
Moreover, both Jesper and Eric note that using only 32 locks increases
false sharing as only two cache lines are used.
This increases locks to 256 (16 cache lines assuming 64byte cacheline and
4 bytes per spinlock).
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
cannot use ARRAY_SIZE() if spinlock_t is empty struct.
Fixes: 1442e7507d ("netfilter: connlimit: use keyed locks")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
drivers/net/ethernet/marvell/mvneta.c
The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an
empty structure. Replace it by CONNTRACK_LOCKS
Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>