Commit Graph

19710 Commits

Author SHA1 Message Date
Paul E. McKenney
bbad937983 rcu: slim down rcutiny by removing rcu_scheduler_active and friends
TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC.
So conditionally compile rcu_scheduler_active in order to slim down
rcutiny a bit more.  Also gets rid of an EXPORT_SYMBOL_GPL, which is
responsible for most of the slimming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10 11:08:34 -07:00
Paul E. McKenney
25502a6c13 rcu: refactor RCU's context-switch handling
The addition of preemptible RCU to treercu resulted in a bit of
confusion and inefficiency surrounding the handling of context switches
for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
is a quiescent state, pure and simple, just like it always has been.
For RCU-preempt, a context switch is in no way a quiescent state, but
special handling is required when a task blocks in an RCU read-side
critical section.

However, the callout from the scheduler and the outer loop in ksoftirqd
still calls something named rcu_sched_qs(), whose name is no longer
accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
quiescent state, it ends up unnecessarily (though harmlessly, aside
from the performance hit) enqueuing the current task if it happens to
be running in an RCU-preempt read-side critical section.  This not only
increases the maximum latency of scheduler_tick(), it also needlessly
increases the overhead of the next outermost rcu_read_unlock() invocation.

This patch addresses this situation by separating the notion of RCU's
context-switch handling from that of RCU-sched's quiescent states.
The context-switch handling is covered by rcu_note_context_switch() in
general and by rcu_preempt_note_context_switch() for preemptible RCU.
This permits rcu_sched_qs() to handle quiescent states and only quiescent
states.  It also reduces the maximum latency of scheduler_tick(), though
probably by much less than a microsecond.  Finally, it means that tasks
within preemptible-RCU read-side critical sections avoid incurring the
overhead of queuing unless there really is a context switch.

Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
2010-05-10 11:08:33 -07:00
Paul E. McKenney
da848c47bc rcu: shrink rcutiny by making synchronize_rcu_bh() be inline
Because synchronize_rcu_bh() is identical to synchronize_sched(),
make the former a static inline invoking the latter, saving the
overhead of an EXPORT_SYMBOL_GPL() and the duplicate code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10 11:08:32 -07:00
Paul E. McKenney
32c141a0a1 rcu: fix now-bogus rcu_scheduler_active comments.
The rcu_scheduler_active check has been wrapped into the new
debug_lockdep_rcu_enabled() function, so update the comments to
reflect this new reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10 11:08:32 -07:00
Paul E. McKenney
d20200b591 rcu: Fix bogus CONFIG_PROVE_LOCKING in comments to reflect reality.
It is CONFIG_DEBUG_LOCK_ALLOC rather than CONFIG_PROVE_LOCKING, so fix it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10 11:08:32 -07:00
Lai Jiangshan
2b3fc35f69 rcu: optionally leave lockdep enabled after RCU lockdep splat
There is no need to disable lockdep after an RCU lockdep splat,
so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the inlined
rcu_dereference_check() and rcu_dereference_protected() macros so that
a given instance splats only once, but so that multiple instances can
be detected per boot.

This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
which is disabled by default.  This provides the normal lockdep behavior
by default, but permits people who want to find multiple RCU-lockdep
splats per boot to easily do so.

Requested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10 11:08:31 -07:00
Linus Torvalds
91bc482ec5 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: create rcu_my_thread_group_empty() wrapper
  memcg: css_id() must be called under rcu_read_lock()
  cgroup: Check task_lock in task_subsys_state()
  sched: Fix an RCU warning in print_task()
  cgroup: Fix an RCU warning in alloc_css_id()
  cgroup: Fix an RCU warning in cgroup_path()
  KEYS: Fix an RCU warning in the reading of user keys
  KEYS: Fix an RCU warning
2010-05-07 13:58:21 -07:00
Paul E. McKenney
ee84b8243b rcu: create rcu_my_thread_group_empty() wrapper
Some RCU-lockdep splat repairs need to know whether they are running
in a single-threaded process.  Unfortunately, the thread_group_empty()
primitive is defined in sched.h, and can induce #include hell.  This
commit therefore introduces a rcu_my_thread_group_empty() wrapper that
is defined in rcupdate.c, thus avoiding the need to include sched.h
everywhere.

Signed-off-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
2010-05-06 09:28:41 -07:00
Li Zefan
1ce7e4ff24 cgroup: Check task_lock in task_subsys_state()
Expand task_subsys_state()'s rcu_dereference_check() to include the full
locking rule as documented in Documentation/cgroups/cgroups.txt by adding
a check for task->alloc_lock being held.

This fixes an RCU false positive when resuming from suspend. The warning
comes from freezer cgroup in cgroup_freezing_or_frozen().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-04 09:25:02 -07:00
Jean Delvare
6629dcff19 i2c-core: Use per-adapter userspace device lists
Using a single list for all userspace devices leads to a dead lock
on multiplexed buses in some circumstances (mux chip instantiated
from userspace). This is solved by using a separate list for each
bus segment.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Michael Lawnick <ml.lawnick@gmx.de>
2010-05-04 11:09:28 +02:00
Daniel Mack
073900a28d USB: rename usb_buffer_alloc() and usb_buffer_free()
For more clearance what the functions actually do,

  usb_buffer_alloc() is renamed to usb_alloc_coherent()
  usb_buffer_free()  is renamed to usb_free_coherent()

They should only be used in code which really needs DMA coherency.

[added compatibility macros so we can convert things easier - gregkh]

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Pedro Ribeiro <pedrib@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-30 09:25:12 -07:00
Linus Torvalds
27fb8d7b1f Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
  nfs: fix some issues in nfs41_proc_reclaim_complete()
  NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
  NFS: Fix an unstable write data integrity race
  nfs: testing for null instead of ERR_PTR()
  NFS: rsize and wsize settings ignored on v4 mounts
  NFSv4: Don't attempt an atomic open if the file is a mountpoint
  SUNRPC: Fix a bug in rpcauth_prune_expired
2010-04-29 10:23:44 -07:00
Linus Torvalds
970b06485f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  coda: move backing-dev.h kernel include inside __KERNEL__
  mtd: ensure that bdi entries are properly initialized and registered
  Move mtd_bdi_*mappable to mtdcore.c
  btrfs: convert to using bdi_setup_and_register()
  Catch filesystems lacking s_bdi
  drbd: Terminate a connection early if sending the protocol fails
  drbd: fix memory leak
  Fix JFFS2 sync silent failure
  smbfs: add bdi backing to mount session
  ncpfs: add bdi backing to mount session
  exofs: add bdi backing to mount session
  ecryptfs: add bdi backing to mount session
  coda: add bdi backing to mount session
  cifs: add bdi backing to mount session
  afs: add bdi backing to mount session.
  9p: add bdi backing to mount session
  bdi: add helper function for doing init and register of a bdi for a file system
  block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer
2010-04-28 07:56:05 -07:00
Jens Axboe
33f60e9640 coda: move backing-dev.h kernel include inside __KERNEL__
Otherwise we must export backing-dev.h as well, which doesn't make
any sense.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-28 09:20:33 +02:00
Jörn Engel
5129a469a9 Catch filesystems lacking s_bdi
noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.

Signed-off-by: Joern Engel <joern@logfs.org>

Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-25 08:54:42 +02:00
Mel Gorman
23be7468e8 hugetlb: fix infinite loop in get_futex_key() when backed by huge pages
If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.

See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.

This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE.  This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex.  No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.

This patch fixes the problem with the test case described in the bugzilla.

[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Darren Hart <darren@dvhart.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:25 -07:00
Josef Bacik
3a3076f4d6 Cleanup generic block based fiemap
This cleans up a few of the complaints of __generic_block_fiemap.  I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned.  It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-23 10:39:48 -07:00
Linus Torvalds
cfc94b2c9a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: ohci: wait for local CSR lock access to finish
  firewire: ohci: prevent aliasing of locally handled register addresses
  firewire: core: fw_iso_resource_manage: return -EBUSY when out of resources
  firewire: core: fix retries calculation in iso manage_channel()
  firewire: cdev: fix cut+paste mistake in disclaimer
2010-04-22 12:54:54 -07:00
Trond Myklebust
71d0a6112a NFS: Fix an unstable write data integrity race
Commit 2c61be0a94 (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.

This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-22 15:35:57 -04:00
Jens Axboe
424264b7b2 smbfs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:37:07 +02:00
Jens Axboe
f1970c73cb ncpfs: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:31:11 +02:00
Jens Axboe
5163d90076 coda: add bdi backing to mount session
This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 12:12:40 +02:00
Jens Axboe
c3c532061e bdi: add helper function for doing init and register of a bdi for a file system
Pretty trivial helper, just sets up the bdi and registers it. An atomic
sequence count is used to ensure that the registered sysfs names are
unique.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 11:39:36 +02:00
Linus Torvalds
458f8c895b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  mc13783-regulator: fix a memory leak in mc13783_regulator_remove
  regulator: Let drivers know when they use the stub API
2010-04-21 12:31:52 -07:00
Sridhar Samudrala
e80e2a60ff KVM: Increase NR_IOBUS_DEVS limit to 200
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 13:08:30 +03:00