Commit Graph

3032 Commits

Author SHA1 Message Date
Stephen Rothwell
61d7b09773 rhashtable: using ERR_PTR requires linux/err.h
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 21:52:24 -08:00
Thomas Graf
020219a69d rhashtable: Fix remove logic to avoid cross references between buckets
The remove logic properly searched the remaining chain for a matching
entry with an identical hash but it did this while searching from both
the old and new table. Instead in order to not leave stale references
behind we need to:

 1. When growing and searching from the new table:
    Search remaining chain for entry with same hash to avoid having
    the new table directly point to a entry with a different hash.

 2. When shrinking and searching from the old table:
    Check if the element after the removed would create a cross
    reference and avoid it if so.

These bugs were present from the beginning in nft_hash.

Also, both insert functions calculated the hash based on the mask of
the new table. This worked while growing. Wwhile shrinking, the mask
of the inew table is smaller than the mask of the old table. This lead
to a bit not being taken into account when selecting the bucket lock
and thus caused the wrong bucket to be locked eventually.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:19:17 -08:00
Thomas Graf
cf52d52f9c rhashtable: Avoid bucket cross reference after removal
During a resize, when two buckets in the larger table map to
a single bucket in the smaller table and the new table has already
been (partially) linked to the old table. Removal of an element
may result the bucket in the larger table to point to entries
which all hash to a different value than the bucket index. Thus
causing two buckets to point to the same sub chain after unzipping.
This is not illegal *during* the resize phase but after it has
completed.

Keep the old table around until all of the unzipping is done to
allow the removal code to only search for matching hashed entries
during this special period.

Reported-by: Ying Xue <ying.xue@windriver.com>
Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:35 -08:00
Thomas Graf
7cd10db8de rhashtable: Add more lock verification
Catch hash miscalculations which result in hard to track down race
conditions.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
a03eaec0df rhashtable: Dump bucket tables on locking violation under PROVE_LOCKING
This simplifies debugging of locking violations if compiled with
CONFIG_PROVE_LOCKING.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
2af4b52988 rhashtable: Wait for RCU readers after final unzip work
We need to wait for all RCU readers to complete after the last bit of
unzipping has been completed. Otherwise the old table is freed up
prematurely.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
a5ec68e3b8 rhashtable: Use a single bucket lock for sibling buckets
rhashtable currently allows to use a bucket lock per bucket. This
requires multiple levels of complicated nested locking because when
resizing, a single bucket of the smaller table will map to two
buckets in the larger table. So far rhashtable has explicitly locked
both buckets in the larger table.

By excluding the highest bit of the hash from the bucket lock map and
thus only allowing locks to buckets in a ratio of 1:2, the locking
can be simplified a lot without losing the benefits of multiple locks.
Larger tables which benefit from multiple locks will not have a single
lock per bucket anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
c88455ce50 rhashtable: key_hashfn() must return full hash value
The value computed by key_hashfn() is used by rhashtable_lookup_compare()
to traverse both tables during a resize. key_hashfn() must therefore
return the hash value without the buckets mask applied so it can be
masked to the size of each individual table.

Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
David S. Miller
f2683b743f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:46:55 -08:00
Herbert Xu
f2dba9c6ff rhashtable: Introduce rhashtable_walk_*
Some existing rhashtable users get too intimate with it by walking
the buckets directly.  This prevents us from easily changing the
internals of rhashtable.

This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.

They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump.  The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:34:52 -08:00
Herbert Xu
28134a53d6 rhashtable: Fix potential crash on destroy in rhashtable_shrink
The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.

This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:34:52 -08:00
Al Viro
57dd8a0735 vhost: vhost_scsi_handle_vq() should just use copy_from_user()
it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:16 -05:00
Al Viro
ba7438aed9 vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:16 -05:00
Al Viro
aad9a1cec7 vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:15 -05:00
Geert Uytterhoeven
9d6dbe1bba rhashtable: Make selftest modular
Allow the selftest on the resizable hash table to be built modular, just
like all other tests that do not depend on DEBUG_KERNEL.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 18:06:33 -08:00
karl beldan
9ce357795e lib/checksum.c: fix build for generic csum_tcpudp_nofold
Fixed commit added from64to32 under _#ifndef do_csum_ but used it
under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
robot reported TILEGX's). Move from64to32 under the latter.

Fixes: 150ae0e946 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 11:57:38 -08:00
karl beldan
150ae0e946 lib/checksum.c: fix carry in csum_tcpudp_nofold
The carry from the 64->32bits folding was dropped, e.g with:
saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
csum_tcpudp_nofold returned 0 instead of 1.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:32:33 -08:00
Thomas Graf
fe6a043c53 rhashtable: rhashtable_remove() must unlink in both tbl and future_tbl
As removals can occur during resizes, entries may be referred to from
both tbl and future_tbl when the removal is requested. Therefore
rhashtable_remove() must unlink the entry in both tables if this is
the case. The existing code did search both tables but stopped when it
hit the first match.

Failing to unlink in both tables resulted in use after free.

Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 11:56:34 -08:00
Ying Xue
57699a40b4 rhashtable: Fix race in rhashtable_destroy() and use regular work_struct
When we put our declared work task in the global workqueue with
schedule_delayed_work(), its delay parameter is always zero.
Therefore, we should define a regular work in rhashtable structure
instead of a delayed work.

By the way, we add a condition to check whether resizing functions
are NULL before cancelling the work, avoiding to cancel an
uninitialized work.

Lastly, while we wait for all work items we submitted before to run
to completion with cancel_delayed_work(), ht->mutex has been taken in
rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
until all work items are accomplished, and when work items are
scheduled, the work's function - rht_deferred_worker() will be called.
However, as rht_deferred_worker() also needs to acquire the lock,
deadlock might happen at the moment as the lock is already held before.
So if the cancel work function is moved out of the lock covered scope,
this will avoid the deadlock.

Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:18:51 -05:00
David S. Miller
3f3558bb51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 00:53:17 -05:00
Thomas Graf
80ca8c3a84 rhashtable: Lower/upper bucket may map to same lock while shrinking
Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:21:44 -05:00
Ying Xue
7a868d1e9a rhashtable: involve rhashtable_lookup_compare_insert routine
Introduce a new function called rhashtable_lookup_compare_insert()
which is very similar to rhashtable_lookup_insert(). But the former
makes use of users' given compare function to look for an object,
and then inserts it into hash table if found. As the entire process
of search and insertion is under protection of per bucket lock, this
can help users to avoid the involvement of extra lock.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 14:01:00 -05:00
Linus Torvalds
aa9291355e Merge tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull kgdb/kdb fixes from Jason Wessel:
 "These have been around since 3.17 and in kgdb-next for the last 9
  weeks and some will go back to -stable.

  Summary of changes:

  Cleanups
   - kdb: Remove unused command flags, repeat flags and KDB_REPEAT_NONE

  Fixes
   - kgdb/kdb: Allow access on a single core, if a CPU round up is
     deemed impossible, which will allow inspection of the now "trashed"
     kernel
   - kdb: Add enable mask for the command groups
   - kdb: access controls to restrict sensitive commands"

* tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  kernel/debug/debug_core.c: Logging clean-up
  kgdb: timeout if secondary CPUs ignore the roundup
  kdb: Allow access to sensitive commands to be restricted by default
  kdb: Add enable mask for groups of commands
  kdb: Categorize kdb commands (similar to SysRq categorization)
  kdb: Remove KDB_REPEAT_NONE flag
  kdb: Use KDB_REPEAT_* values as flags
  kdb: Rename kdb_register_repeat() to kdb_register_flags()
  kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
  kdb: Remove currently unused kdbtab_t->cmd_flags
2015-01-09 20:51:10 -08:00
Ying Xue
545a148e43 rhashtable: initialize atomic nelems variable
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-08 19:47:13 -08:00