Commit Graph

2603 Commits

Author SHA1 Message Date
David S. Miller 6934e79ed1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes for net-next

This patchset contains fixes for recent updates available in your
net-next, they are:

1) Fix double memory allocation for accounting objects that results
   in a leak, this slipped through with the new quota extension,
   patch from Mathieu Poirier.

2) Fix broken ordering when adding set element transactions.

3) Make sure that objects are released in reverse order in the abort
   path, to avoid possible use-after-free when accessing dependencies.

4) Allow to delete several objects (as long as dependencies are
   fulfilled) by using one batch. This includes changes in the use
   counter semantics of the nf_tables objects.

5) Fix illegal sleeping allocation from rcu callback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:35:04 -07:00
WANG Cong 4cb28970a2 net: use the new API kvfree()
It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:49:51 -07:00
David S. Miller c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
Eric Dumazet 73f156a6e8 inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:00:41 -07:00
Pablo Neira Ayuso 31f8441c32 netfilter: nf_tables: atomic allocation in set notifications from rcu callback
Use GFP_ATOMIC allocations when sending removal notifications of
anonymous sets from rcu callback context. Sleeping in that context
is illegal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:38 +02:00
Pablo Neira Ayuso 4fefee570d netfilter: nf_tables: allow to delete several objects from a batch
Three changes to allow the deletion of several objects with dependencies
in one transaction, they are:

1) Introduce speculative counter increment/decrement that is undone in
   the abort path if required, thus we avoid hitting -EBUSY when deleting
   the chain. The counter updates are reverted in the abort path.

2) Increment/decrement table/chain use counter for each set/rule. We need
   this to fully rely on the use counters instead of the list content,
   eg. !list_empty(&chain->rules) which evaluate true in the middle of the
   transaction.

3) Decrement table use counter when an anonymous set is bound to the
   rule in the commit path. This avoids hitting -EBUSY when deleting
   the table that contains anonymous sets. The anonymous sets are released
   in the nf_tables_rule_destroy path. This should not be a problem since
   the rule already bumped the use counter of the chain, so the bound
   anonymous set reflects dependencies through the rule object, which
   already increases the chain use counter.

So the general assumption after this patch is that the use counters are
bumped by direct object dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:35 +02:00
Pablo Neira Ayuso 7632667d26 netfilter: nft_rbtree: introduce locking
There's no rbtree rcu version yet, so let's fall back on the spinlock
to protect the concurrent access of this structure both from user
(to update the set content) and kernel-space (in the packet path).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:31 +02:00
Pablo Neira Ayuso a1cee076f4 netfilter: nf_tables: release objects in reverse order in the abort path
The patch c7c32e7 ("netfilter: nf_tables: defer all object release via
rcu") indicates that we always release deleted objects in the reverse
order, but that is only needed in the abort path. These are the two
possible scenarios when releasing objects:

1) Deletion scenario in the commit path: no need to release objects in
the reverse order since userspace already ensures that dependencies are
fulfilled), ie. userspace tells us to delete rule -> ... -> rule ->
chain -> table. In this case, we have to release the objects in the
*same order* as userspace provided.

2) Deletion scenario in the abort path: we have to iterate in the reverse
order to undo what it cannot be added, ie. userspace sent us a batch
that includes: table -> chain -> rule -> ... -> rule, and that needs to
be partially undone. In this case, we have to release objects in the
reverse order to ensure that the set and chain objects point to valid
rule and table objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:28 +02:00
Pablo Neira Ayuso 46bbafceb2 netfilter: nf_tables: fix wrong transaction ordering in set elements
The transaction needs to be placed at the end of the commit list,
otherwise event notifications are reordered and we may crash when
releasing object via call_rcu.

This problem was introduced in 60319eb ("netfilter: nf_tables: use new
transaction infrastructure to handle elements").

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:25 +02:00
Mathieu Poirier 4c552a64df netfilter: nfnetlink_acct: Fix memory leak
Allocation of memory need only to happen once, that is
after the proper checks on the NFACCT_FLAGS have been
done.  Otherwise the code can return without freeing
already allocated memory.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:46:52 +02:00
David S. Miller 90d0e08e57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

This small patchset contains three accumulated Netfilter/IPVS updates,
they are:

1) Refactorize common NAT code by encapsulating it into a helper
   function, similarly to what we do in other conntrack extensions,
   from Florian Westphal.

2) A minor format string mismatch fix for IPVS, from Masanari Iida.

3) Add quota support to the netfilter accounting infrastructure, now
   you can add quotas to accounting objects via the nfnetlink interface
   and use them from iptables. You can also listen to quota
   notifications from userspace. This enhancement from Mathieu Poirier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:54:47 -07:00
Peter Christensen f44a5f45f5 ipvs: Fix panic due to non-linear skb
Receiving a ICMP response to an IPIP packet in a non-linear skb could
cause a kernel panic in __skb_pull.

The problem was introduced in
commit f2edb9f770 ("ipvs: implement
passive PMTUD for IPIP packets").

Signed-off-by: Peter Christensen <pch@ordbogen.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-05-26 10:22:46 +09:00
David S. Miller 54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Daniel Borkmann b1fcd35cf5 net: filter: let unattached filters use sock_fprog_kern
The sk_unattached_filter_create() API is used by BPF filters that
are not directly attached or related to sockets, and are used in
team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
internal managment of obtaining filter blocks and thus already
have them in kernel memory and set up before calling into
sk_unattached_filter_create(). As a result, due to __user annotation
in sock_fprog, sparse triggers false positives (incorrect type in
assignment [different address space]) when filters are set up before
passing them to sk_unattached_filter_create(). Therefore, let
sk_unattached_filter_create() API use sock_fprog_kern to overcome
this issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:48:05 -04:00
David S. Miller 8af750d739 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:06:23 -04:00
Pablo Neira Ayuso c7c32e72cb netfilter: nf_tables: defer all object release via rcu
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso 128ad3322b netfilter: nf_tables: remove skb and nlh from context structure
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso 35151d840c netfilter: nf_tables: simplify nf_tables_*_notify
Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso 60319eb1ca netfilter: nf_tables: use new transaction infrastructure to handle elements
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso 55dd6f9307 netfilter: nf_tables: use new transaction infrastructure to handle table
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso e1aaca93ee netfilter: nf_tables: pass context to nf_tables_updtable()
So nf_tables_uptable() only takes one single parameter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso f75edf5e9c netfilter: nf_tables: disabling table hooks always succeeds
nf_tables_table_disable() always succeeds, make this function void.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso 91c7b38dc9 netfilter: nf_tables: use new transaction infrastructure to handle chain
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso ff3cd7b3c9 netfilter: nf_tables: refactor chain statistic routines
Add new routines to encapsulate chain statistics allocation and
replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso 958bee14d0 netfilter: nf_tables: use new transaction infrastructure to handle sets
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

 1) create the set and send netlink message to the kernel
 2) process the response from the kernel that contains the allocated name.
 3) add the set elements and send netlink message to the kernel.
 4) process the response from the kernel (to check for errors).

To:

 1) add the set to the batch.
 2) add the set elements to the batch.
 3) add the rule that points to the set.
 4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00