Commit Graph

1078 Commits

Author SHA1 Message Date
Sven Wegener
fb97898fa8 ipvs: Add missing locking during connection table hashing and unhashing
commit aea9d711f3 upstream.

The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.

Here is what happens in pretty verbose form:

CPU 0					CPU 1
------------				------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.

					IRQ assignment is changed to this CPU,
					but the expire timer stays scheduled on
					the other CPU.

					New connection from same ip:port comes
					in right before the timer expires, we
					find the inactive connection in our
					connection table and get a reference to
					it. We proper lock the connection in
					tcp_state_transition() and read the
					connection flags in set_tcp_state().

ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!

					While still holding proper locks we
					write the connection flags in
					set_tcp_state() and this sets the hashed
					flag again.

ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.

					We drop the reference we held on the
					connection and schedule the expire timer
					for timeouting the connection on this
					CPU. Further packets won't be able to
					find this connection in our connection
					table.

					ip_vs_conn_expire() gets called again,
					we think it's already hashed, but the
					list_head is dangling and while removing
					the connection from our connection table
					we write to the memory location where
					this list_head points to.

The result will probably be a kernel oops at some other point in time.

This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02 10:20:50 -07:00
Patrick McHardy
5a32ab6f23 netfilter: xt_recent: fix regression in rules using a zero hit_count
commit ef1691504c upstream.

Commit 8ccb92ad (netfilter: xt_recent: fix false match) fixed supposedly
false matches in rules using a zero hit_count. As it turns out there is
nothing false about these matches and people are actually using entries
with a hit_count of zero to make rules dependant on addresses inserted
manually through /proc.

Since this slipped past the eyes of three reviewers, instead of
reverting the commit in question, this patch explicitly checks
for a hit_count of zero to make the intentions more clear.

Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01 15:58:47 -07:00
Tim Gardner
58368fc832 netfilter: xt_recent: fix false match
commit 8ccb92ad41 upstream.

A rule with a zero hit_count will always match.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:50:01 -07:00
Tim Gardner
6f2deb6aad netfilter: xt_recent: fix buffer overflow
commit 2c08522e5d upstream.

e->index overflows e->stamps[] every ip_pkt_list_tot packets.

Consider the case when ip_pkt_list_tot==1; the first packet received is stored
in e->stamps[0] and e->index is initialized to 1. The next received packet
timestamp is then stored at e->stamps[1] in recent_entry_update(),
a buffer overflow because the maximum e->stamps[] index is 0.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:50:00 -07:00
Patrick McHardy
242a71829e netfilter: nf_conntrack: fix hash resizing with namespaces
commit d696c7bdaa upstream.

As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:53 -08:00
Alexey Dobriyan
51d3a34794 netfilter: nf_conntrack: restrict runtime expect hashsize modifications
commit 13ccdfc2af upstream.

Expectation hashtable size was simply glued to a variable with no code
to rehash expectations, so it was a bug to allow writing to it.
Make "expect_hashsize" readonly.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:53 -08:00
Eric Dumazet
747edef00c netfilter: nf_conntrack: per netns nf_conntrack_cachep
commit 5b3501faa8 upstream.

nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.

If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.

We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).

If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:53 -08:00
Patrick McHardy
de2545859d netfilter: nf_conntrack: fix memory corruption with multiple namespaces
commit 9edd7ca0a3 upstream.

As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
conntrack, which is located in the data section, might be accidentally
freed when a new namespace is instantiated while the untracked conntrack
is attached to a skb because the reference count it re-initialized.

The best fix would be to use a seperate untracked conntrack per
namespace since it includes a namespace pointer. Unfortunately this is
not possible without larger changes since the namespace is not easily
available everywhere we need it. For now move the untracked conntrack
initialization to the init_net setup function to make sure the reference
count is not re-initialized and handle cleanup in the init_net cleanup
function to make sure namespaces can exit properly while the untracked
conntrack is in use in other namespaces.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:53 -08:00
Florian Westphal
8f2fefcddf netfilter: xtables: fix conntrack match v1 ipt-save output
commit 3a0429292d upstream.

commit d6d3f08b0f
(netfilter: xtables: conntrack match revision 2) does break the
v1 conntrack match iptables-save output in a subtle way.

Problem is as follows:

    up = kmalloc(sizeof(*up), GFP_KERNEL);
[..]
   /*
    * The strategy here is to minimize the overhead of v1 matching,
    * by prebuilding a v2 struct and putting the pointer into the
    * v1 dataspace.
    */
    memcpy(up, info, offsetof(typeof(*info), state_mask));
[..]
    *(void **)info  = up;

As the v2 struct pointer is saved in the match data space,
it clobbers the first structure member (->origsrc_addr).

Because the _v1 match function grabs this pointer and does not actually
look at the v1 origsrc, run time functionality does not break.
But iptables -nvL (or iptables-save) cannot know that v1 origsrc_addr
has been overloaded in this way:

$ iptables -p tcp -A OUTPUT -m conntrack --ctorigsrc 10.0.0.1 -j ACCEPT
$ iptables-save
-A OUTPUT -p tcp -m conntrack --ctorigsrc 128.173.134.206 -j ACCEPT

(128.173... is the address to the v2 match structure).

To fix this, we take advantage of the fact that the v1 and v2 structures
are identical with exception of the last two structure members (u8 in v1,
u16 in v2).

We extract them as early as possible and prevent the v2 matching function
from looking at those two members directly.

Previously reported by Michel Messerschmidt via Ben Hutchings, also
see Debian Bug tracker #556587.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:01:04 -08:00
Patrick McHardy
545b02070b netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()
commit aaff23a95a upstream.

As noticed by Dan Carpenter <error27@gmail.com>, update_nl_seq()
currently contains an out of bounds read of the seq_aft_nl array
when looking for the oldest sequence number position.

Fix it to only compare valid positions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:41 -08:00
Simon Horman
75bc75ed15 ipvs: zero usvc and udest
commit 258c889362 upstream.

Make sure that any otherwise uninitialised fields of usvc are zero.

This has been obvserved to cause a problem whereby the port of
fwmark services may end up as a non-zero value which causes
scheduling of a destination server to fail for persisitent services.

As observed by Deon van der Merwe <dvdm@truteq.co.za>.
This fix suggested by Julian Anastasov <ja@ssi.bg>.

For good measure also zero udest.

Cc: Deon van der Merwe <dvdm@truteq.co.za>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:51 -08:00
David S. Miller
73570314e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-11-23 09:52:51 -08:00
Patrick McHardy
8fa539bd91 netfilter: xt_limit: fix invalid return code in limit_mt_check()
Commit acc738fe (netfilter: xtables: avoid pointer to self) introduced
an invalid return value in limit_mt_check().

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23 13:37:23 +01:00
Patrick McHardy
6440fe059e netfilter: nf_log: fix sleeping function called from invalid context in seq_show()
[  171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280
[  171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep
[  171.925306] 2 locks held by grep/671:
[  171.925312]  #0:  (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c
[  171.925340]  #1:  (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44
[  171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3
[  171.925380] Call Trace:
[  171.925398]  [<c105104e>] ? __debug_show_held_locks+0x1e/0x20
[  171.925414]  [<c10264ac>] __might_sleep+0xfb/0x102
[  171.925430]  [<c1461521>] mutex_lock_nested+0x1c/0x2ad
[  171.925444]  [<c1391c9e>] seq_show+0x74/0x127
[  171.925456]  [<c10b8c5c>] seq_read+0x1b4/0x36c
[  171.925469]  [<c10b8aa8>] ? seq_read+0x0/0x36c
[  171.925483]  [<c10d5c8e>] proc_reg_read+0x60/0x74
[  171.925496]  [<c10d5c2e>] ? proc_reg_read+0x0/0x74
[  171.925510]  [<c10a4468>] vfs_read+0x87/0x110
[  171.925523]  [<c10a458a>] sys_read+0x3b/0x60
[  171.925538]  [<c1002a49>] syscall_call+0x7/0xb

Fix it by replacing RCU with nf_log_mutex.

Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-19 13:16:31 -08:00
Patrick McHardy
d667b9cfd0 netfilter: xt_osf: fix xt_osf_remove_callback() return value
Return a negative error value.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-19 13:16:26 -08:00
Wu Fengguang
7378396cd1 netfilter: nf_log: fix sleeping function called from invalid context in seq_show()
[  171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280
[  171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep
[  171.925306] 2 locks held by grep/671:
[  171.925312]  #0:  (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c
[  171.925340]  #1:  (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44
[  171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3
[  171.925380] Call Trace:
[  171.925398]  [<c105104e>] ? __debug_show_held_locks+0x1e/0x20
[  171.925414]  [<c10264ac>] __might_sleep+0xfb/0x102
[  171.925430]  [<c1461521>] mutex_lock_nested+0x1c/0x2ad
[  171.925444]  [<c1391c9e>] seq_show+0x74/0x127
[  171.925456]  [<c10b8c5c>] seq_read+0x1b4/0x36c
[  171.925469]  [<c10b8aa8>] ? seq_read+0x0/0x36c
[  171.925483]  [<c10d5c8e>] proc_reg_read+0x60/0x74
[  171.925496]  [<c10d5c2e>] ? proc_reg_read+0x0/0x74
[  171.925510]  [<c10a4468>] vfs_read+0x87/0x110
[  171.925523]  [<c10a458a>] sys_read+0x3b/0x60
[  171.925538]  [<c1002a49>] syscall_call+0x7/0xb

Fix it by replacing RCU with nf_log_mutex.

Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-13 09:34:44 +01:00
Roel Kluin
1c622ae67b netfilter: xt_osf: fix xt_osf_remove_callback() return value
Return a negative error value.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-13 09:31:35 +01:00
Linus Torvalds
1ce55238e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net/fsl_pq_mdio: add module license GPL
  can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
  can: should not use __dev_get_by_index() without locks
  hisax: remove bad udelay call to fix build error on ARM
  ipip: Fix handling of DF packets when pmtudisc is OFF
  qlge: Set PCIe reset type for EEH to fundamental.
  qlge: Fix early exit from mbox cmd complete wait.
  ixgbe: fix traffic hangs on Tx with ioatdma loaded
  ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
  ixgbe: Fix gso_max_size for 82599 when DCB is enabled
  macsonic: fix crash on PowerBook 520
  NET: cassini, fix lock imbalance
  ems_usb: Fix byte order issues on big endian machines
  be2net: Bug fix to send config commands to hardware after netdev_register
  be2net: fix to set proper flow control on resume
  netfilter: xt_connlimit: fix regression caused by zero family value
  rt2x00: Don't queue ieee80211 work after USB removal
  Revert "ipw2200: fix oops on missing firmware"
  decnet: netdevice refcount leak
  netfilter: nf_nat: fix NAT issue in 2.6.30.4+
  ...
2009-11-09 09:51:42 -08:00
Jan Engelhardt
539054a8fa netfilter: xt_connlimit: fix regression caused by zero family value
Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all
instances of par->match->family were changed to par->family.

References: http://bugzilla.netfilter.org/show_bug.cgi?id=610
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 18:08:32 -08:00
Jozsef Kadlecsik
f9dd09c7f7 netfilter: nf_nat: fix NAT issue in 2.6.30.4+
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
over NAT. The "cause" of the problem was a fix of unacknowledged data
detection with NAT (commit a3a9f79e36).
However, actually, that fix uncovered a long standing bug in TCP conntrack:
when NAT was enabled, we simply updated the max of the right edge of
the segments we have seen (td_end), by the offset NAT produced with
changing IP/port in the data. However, we did not update the other parameter
(td_maxend) which is affected by the NAT offset. Thus that could drift
away from the correct value and thus resulted breaking active FTP.

The patch below fixes the issue by *not* updating the conntrack parameters
from NAT, but instead taking into account the NAT offsets in conntrack in a
consistent way. (Updating from NAT would be more harder and expensive because
it'd need to re-calculate parameters we already calculated in conntrack.)

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 00:43:42 -08:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
David S. Miller
b7058842c9 net: Make setsockopt() optlen be unsigned.
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:12:20 -07:00
Alexey Dobriyan
8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Jan Beulich
4481374ce8 mm: replace various uses of num_physpages by totalram_pages
Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages.  The amount of what actually is usable as storage
should instead be used as a basis here.

Some of the calculations (i.e.  those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:38 -07:00
David S. Miller
9a0da0d19c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-09-10 18:17:09 -07:00