Commit Graph

88 Commits

Author SHA1 Message Date
Joe Perches 75a3294ec5 fscache: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:16 -07:00
Fabian Frederick 3185a88ce3 fs/fscache: replace seq_printf by seq_puts
Replace seq_printf where possible + coalesce formats from 2 existing
seq_puts

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:52 -07:00
Fabian Frederick 36dfd116ed fs/fscache: convert printk to pr_foo()
All printk converted to pr_foo() except internal.h: printk(KERN_DEBUG

Coalesce formats.

Add pr_fmt

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:51 -07:00
David Howells 7026f1929e FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree
When FS-Cache allocates an object, the following sequence of events can
occur:

 -->fscache_alloc_object()
    -->cachefiles_alloc_object() [via cache->ops->alloc_object]
    <--[returns new object]
    -->fscache_attach_object()
    <--[failed]
    -->cachefiles_put_object() [via cache->ops->put_object]
       -->fscache_object_destroy()
          -->fscache_objlist_remove()
             -->rb_erase() to remove the object from fscache_object_list.

resulting in a crash in the rbtree code.

The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().

So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree.  We do, however, unconditionally try to remove the
object from the tree.

Thanks to NeilBrown for finding this and suggesting this solution.

Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: (a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-17 13:47:35 -08:00
Linus Torvalds 0910c0bdf7 Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block
Pull block IO core updates from Jens Axboe:
 "This is the pull request for the core changes in the block layer for
  3.13.  It contains:

   - The new blk-mq request interface.

     This is a new and more scalable queueing model that marries the
     best part of the request based interface we currently have (which
     is fully featured, but scales poorly) and the bio based "interface"
     which the new drivers for high IOPS devices end up using because
     it's much faster than the request based one.

     The bio interface has no block layer support, since it taps into
     the stack much earlier.  This means that drivers end up having to
     implement a lot of functionality on their own, like tagging,
     timeout handling, requeue, etc.  The blk-mq interface provides all
     these.  Some drivers even provide a switch to select bio or rq and
     has code to handle both, since things like merging only works in
     the rq model and hence is faster for some workloads.  This is a
     huge mess.  Conversion of these drivers nets us a substantial code
     reduction.  Initial results on converting SCSI to this model even
     shows an 8x improvement on single queue devices.  So while the
     model was intended to work on the newer multiqueue devices, it has
     substantial improvements for "classic" hardware as well.  This code
     has gone through extensive testing and development, it's now ready
     to go.  A pull request is coming to convert virtio-blk to this
     model will be will be coming as well, with more drivers scheduled
     for 3.14 conversion.

   - Two blktrace fixes from Jan and Chen Gang.

   - A plug merge fix from Alireza Haghdoost.

   - Conversion of __get_cpu_var() from Christoph Lameter.

   - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

   - A fix for a race between request completion and the timeout
     handling from Jeff Moyer.  This is what caused the merge conflict
     with blk-mq/core, in case you are looking at that.

   - A dm stacking fix from Mike Snitzer.

   - A code consolidation fix and duplicated code removal from Kent
     Overstreet.

   - A handful of block bug fixes from Mikulas Patocka, fixing a loop
     crash and memory corruption on blk cg.

   - Elevator switch bug fix from Tomoki Sekiyama.

  A heads-up that I had to rebase this branch.  Initially the immutable
  bio_vecs had been queued up for inclusion, but a week later, it became
  clear that it wasn't fully cooked yet.  So the decision was made to
  pull this out and postpone it until 3.14.  It was a straight forward
  rebase, just pruning out the immutable series and the later fixes of
  problems with it.  The rest of the patches applied directly and no
  further changes were made"

* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: Do not call sector_div() with a 64-bit divisor
  kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
  block: Consolidate duplicated bio_trim() implementations
  block: Use rw_copy_check_uvector()
  block: Enable sysfs nomerge control for I/O requests in the plug list
  block: properly stack underlying max_segment_size to DM device
  elevator: acquire q->sysfs_lock in elevator_change()
  elevator: Fix a race in elevator switching and md device initialization
  block: Replace __get_cpu_var uses
  bdi: test bdi_init failure
  block: fix a probe argument to blk_register_region
  loop: fix crash if blk_alloc_queue fails
  blk-core: Fix memory corruption if blkcg_init_queue fails
  block: fix race between request completion and timeout handling
  blktrace: Send BLK_TN_PROCESS events to all running traces
  blk-mq: don't disallow request merges for req->special being set
  blk-mq: mq plug list breakage
  blk-mq: fix for flush deadlock
  ...
2013-11-14 12:08:14 +09:00
Christoph Lameter 170d800af8 block: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:58 -07:00
David Howells 94d30ae90a FS-Cache: Provide the ability to enable/disable cookies
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells <dhowells@redhat.com
2013-09-27 18:40:25 +01:00
David Howells 8fb883f3e3 FS-Cache: Add use/unuse/wake cookie wrappers
Add wrapper functions for dealing with cookie->n_active:

 (*) __fscache_use_cookie() to increment it.

 (*) __fscache_unuse_cookie() to decrement and test against zero.

 (*) __fscache_wake_unused_cookie() to wake up anyone waiting for it to reach
     zero.

The second and third are split so that the third can be done after cookie->lock
has been released in case the waiter wakes up whilst we're still holding it and
tries to get it.

We will need to wake-on-zero once the cookie disablement patch is applied
because it will then be possible to see n_active become zero without the cookie
being relinquished.

Also move the cookie usement out of fscache_attr_changed_op() and into
fscache_attr_changed() and the operation struct so that cookie disablement
will be able to track it.

Whilst we're at it, only increment n_active if we're about to do
fscache_submit_op() so that we don't have to deal with undoing it if anything
earlier fails.  Possibly this should be moved into fscache_submit_op() which
could look at FSCACHE_OP_UNUSE_COOKIE.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-27 18:40:25 +01:00
Linus Torvalds e9ff04dd94 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "These fix several bugs with RBD from 3.11 that didn't get tested in
  time for the merge window: some error handling, a use-after-free, and
  a sequencing issue when unmapping and image races with a notify
  operation.

  There is also a patch fixing a problem with the new ceph + fscache
  code that just went in"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  fscache: check consistency does not decrement refcount
  rbd: fix error handling from rbd_snap_name()
  rbd: ignore unmapped snapshots that no longer exist
  rbd: fix use-after free of rbd_dev->disk
  rbd: make rbd_obj_notify_ack() synchronous
  rbd: complete notifies before cleaning up osd_client and rbd_dev
  libceph: add function to ensure notifies are complete
2013-09-19 12:50:37 -05:00
Jan Kara 5e4c0d9741 lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt
With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];
<interrupt>
...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice.  That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt.  Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes.  However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed.  To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:36 -07:00
Milosz Tanski 9c89d62948 fscache: check consistency does not decrement refcount
__fscache_check_consistency() does not decrement the count of operations
active after it finishes in the success case. This leads to a hung tasks on
cookie de-registration (commonly in inode eviction).

INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
kworker/1:2     D ffff880443513fc0     0  4214      2 0x00000000
Workqueue: ceph-msgr con_work [libceph]
  ...
Call Trace:
 [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
 [<ffffffff81568d09>] schedule+0x29/0x70
 [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
 [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
 [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
 [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
 [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
 [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
 [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
 [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
 [<ffffffff8119bb69>] evict+0x119/0x1b0

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-10 09:04:46 -07:00
Milosz Tanski 5a6f282a20 fscache: Netfs function for cleanup post readpages
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells da9803bc88 FS-Cache: Add interface to check consistency of a cached object
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
2013-09-06 09:17:30 +01:00
David Howells dcfae32f89 FS-Cache: Don't use spin_is_locked() in assertions
Under certain circumstances, spin_is_locked() is hardwired to 0 - even when the
code would normally be in a locked section where it should return 1.  This
means it cannot be used for an assertion that checks that a spinlock is locked.

Remove such usages from FS-Cache.

The following oops might otherwise be observed:

FS-Cache: Assertion failed
BUG: failure at fs/fscache/operation.c:270/fscache_start_operations()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 10 Comm: kworker/u2:1 Not tainted 3.10.0-rc1-00133-ge7ebb75 #2
Workqueue: fscache_operation fscache_op_work_func [fscache]
7f091c48 603c8947 7f090000 7f9b1361 7f25f080 00000001 7f26d440 7f091c90
60299eb8 7f091d90 602951c5 7f26d440 3000000008 7f091da0 7f091cc0 7f091cd0
00000007 00000007 00000006 7f091ae0 00000010 0000010e 7f9af330 7f091ae0
Call Trace:
7f091c88: [<60299eb8>] dump_stack+0x17/0x19
7f091c98: [<602951c5>] panic+0xf4/0x1e9
7f091d38: [<6002b10e>] set_signals+0x1e/0x40
7f091d58: [<6005b89e>] __wake_up+0x4e/0x70
7f091d98: [<7f9aa003>] fscache_start_operations+0x43/0x50 [fscache]
7f091da8: [<7f9aa1e3>] fscache_op_complete+0x1d3/0x220 [fscache]
7f091db8: [<60082985>] unlock_page+0x55/0x60
7f091de8: [<7fb25bb0>] cachefiles_read_copier+0x250/0x330 [cachefiles]
7f091e58: [<7f9ab03c>] fscache_op_work_func+0xac/0x120 [fscache]
7f091e88: [<6004d5b0>] process_one_work+0x250/0x3a0
7f091ef8: [<6004edc7>] worker_thread+0x177/0x2a0
7f091f38: [<6004ec50>] worker_thread+0x0/0x2a0
7f091f58: [<60054418>] kthread+0xd8/0xe0
7f091f68: [<6005bb27>] finish_task_switch.isra.64+0x37/0xa0
7f091fd8: [<600185cf>] new_thread_handler+0x8f/0xb0

Reported-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-By: Milosz Tanski <milosz@adfin.com>
2013-06-19 14:16:47 +01:00
David Howells 1bb4b7f98f FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
struct fscache_retrieval contains a count of the number of pages that still
need some processing (n_pages).  This is decremented as the pages are
processed.

However, this needs to be atomic as fscache_retrieval_complete() (I think) just
occasionally may be called from cachefiles_read_backing_file() and
cachefiles_read_copier() simultaneously.

This happens when an fscache_read_or_alloc_pages() request containing a lot of
pages (say a couple of hundred) is being processed.  The read on each backing
page is dispatched individually because we need to insert a monitor into the
waitqueue to catch when the read completes.  However, under low-memory
conditions, we might be forced to wait in the allocator - and this gives the
I/O on the backing page a chance to complete first.

When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
the workqueue without waiting for the operation to finish the initial I/O
dispatch (we want to release any pages we can as soon as we can), thus both can
end up running simultaneously and potentially attempting to partially complete
the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
the page cache).

This was demonstrated by parallelling the non-atomic counter with an atomic
counter and printing both of them when the assertion fails.  At this point, the
atomic counter has reached zero, but the non-atomic counter has not.

To fix this, make the counter an atomic_t.

This results in the following bug appearing

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:421!

or

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:414!

With a backtrace like the following:

RIP: 0010:[<ffffffffa0211b1d>] fscache_put_operation+0x1ad/0x240 [fscache]
Call Trace:
 [<ffffffffa0213185>] fscache_retrieval_work+0x55/0x270 [fscache]
 [<ffffffffa0213130>] ? fscache_retrieval_work+0x0/0x270 [fscache]
 [<ffffffff81090b10>] worker_thread+0x170/0x2a0
 [<ffffffff81096d10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810909a0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81096966>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968d0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
David Howells 1362729b16 FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
Simplify the way fscache cache objects retain their cookie.  The way I
implemented the cookie storage handling made synchronisation a pain (ie. the
object state machine can't rely on the cookie actually still being there).

Instead of the the object being detached from the cookie and the cookie being
freed in __fscache_relinquish_cookie(), we defer both operations:

 (*) The detachment of the object from the list in the cookie now takes place
     in fscache_drop_object() and is thus governed by the object state machine
     (fscache_detach_from_cookie() has been removed).

 (*) The release of the cookie is now in fscache_object_destroy() - which is
     called by the cache backend just before it frees the object.

This means that the fscache_cookie struct is now available to the cache all the
way through from ->alloc_object() to ->drop_object() and ->put_object() -
meaning that it's no longer necessary to take object->lock to guarantee access.

However, __fscache_relinquish_cookie() doesn't wait for the object to go all
the way through to destruction before letting the netfs proceed.  That would
massively slow down the netfs.  Since __fscache_relinquish_cookie() leaves the
cookie around, in must therefore break all attachments to the netfs - which
includes ->def, ->netfs_data and any outstanding page read/writes.

To handle this, struct fscache_cookie now has an n_active counter:

 (1) This starts off initialised to 1.

 (2) Any time the cache needs to get at the netfs data, it calls
     fscache_use_cookie() to increment it - if it is not zero.  If it was zero,
     then access is not permitted.

 (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
     to decrement it.  This does a wake-up on it if it reaches 0.

 (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
     reach 0.  The initialisation to 1 in step (1) ensures that we only get
     wake ups when we're trying to get rid of the cookie.

This leaves __fscache_relinquish_cookie() a lot simpler.


***
This fixes a problem in the current code whereby if fscache_invalidate() is
followed sufficiently quickly by fscache_relinquish_cookie() then it is
possible for __fscache_relinquish_cookie() to have detached the cookie from the
object and cleared the pointer before a thread is dispatched to process the
invalidation state in the object state machine.

Since the pending write clearance was deferred to the invalidation state to
make it asynchronous, we need to either wait in relinquishment for the stores
tree to be cleared in the invalidation state or we need to handle the clearance
in relinquishment.

Further, if the relinquishment code does clear the tree, then the invalidation
state need to make the clearance contingent on still having the cookie to hand
(since that's where the tree is rooted) and we have to prevent the cookie from
disappearing for the duration.

This can lead to an oops like the following:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
...
RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
...
CR2: 000000000000000c ...
...
Process kslowd002 (...)
....
Call Trace:
 [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
 [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
 [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
 [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff8110e233>] slow_work_execute+0x233/0x310
 [<ffffffff8110e515>] slow_work_thread+0x205/0x360
 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

The parameter to fscache_invalidate_writes() was object->cookie which is NULL.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
David Howells caaef6900b FS-Cache: Fix object state machine to have separate work and wait states
Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

 (1) Work state.  This is an execution state.  No event processing is performed
     by a work state.  The function attached to a work state returns a pointer
     indicating the next state to which the OSM should transition.  Returning
     NO_TRANSIT repeats the current state, but goes back to the scheduler
     first.

 (2) Wait state.  This is an event processing state.  No execution is
     performed by a wait state.  Wait states are just tables of "if event X
     occurs, clear it and transition to state Y".  The dispatcher returns to
     the scheduler if none of the events in which the wait state has an
     interest are currently pending.

 (3) Out-of-band state.  This is a special work state.  Transitions to normal
     states can be overridden when an unexpected event occurs (eg. I/O error).
     Instead the dispatcher disables and clears the OOB event and transits to
     the specified work state.  This then acts as an ordinary work state,
     though object->state points to the overridden destination.  Returning
     NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object->flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL).  An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
David Howells 493f7bc114 FS-Cache: Wrap checks on object state
Wrap checks on object state (mostly outside of fs/fscache/object.c) with
inline functions so that the mechanism can be replaced.

Some of the state checks within object.c are left as-is as they will be
replaced.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
David Howells 610be24ee4 FS-Cache: Uninline fscache_object_init()
Uninline fscache_object_init() so as not to expose some of the FS-Cache
internals to the cache backend.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
David Howells 0c59a95d90 FS-Cache: Don't sleep in page release if __GFP_FS is not set
Don't sleep in __fscache_maybe_release_page() if __GFP_FS is not set.  This
goes some way towards mitigating fscache deadlocking against ext4 by way of
the allocator, eg:

INFO: task flush-8:0:24427 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0       D ffff88003e2b9fd8     0 24427      2 0x00000000
 ffff88003e2b9138 0000000000000046 ffff880012e3a040 ffff88003e2b9fd8
 0000000000011c80 ffff88003e2b9fd8 ffffffff81a10400 ffff880012e3a040
 0000000000000002 ffff880012e3a040 ffff88003e2b9098 ffffffff8106dcf5
Call Trace:
 [<ffffffff8106dcf5>] ? __lock_is_held+0x31/0x53
 [<ffffffff81219b61>] ? radix_tree_lookup_element+0xf4/0x12a
 [<ffffffff81454bed>] schedule+0x60/0x62
 [<ffffffffa01d349c>] __fscache_wait_on_page_write+0x8b/0xa5 [fscache]
 [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d
 [<ffffffffa01d393a>] __fscache_maybe_release_page+0x30c/0x324 [fscache]
 [<ffffffffa01d369a>] ? __fscache_maybe_release_page+0x6c/0x324 [fscache]
 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
 [<ffffffffa01fd7b2>] nfs_fscache_release_page+0x68/0x94 [nfs]
 [<ffffffffa01ef73e>] nfs_release_page+0x7e/0x86 [nfs]
 [<ffffffff810aa553>] try_to_release_page+0x32/0x3b
 [<ffffffff810b6c70>] shrink_page_list+0x535/0x71a
 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
 [<ffffffff810b7352>] shrink_inactive_list+0x20a/0x2dd
 [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea
 [<ffffffff810b7a65>] shrink_lruvec+0x34c/0x3eb
 [<ffffffff810b7bd3>] do_try_to_free_pages+0xcf/0x355
 [<ffffffff810b7fc8>] try_to_free_pages+0x9a/0xa1
 [<ffffffff810b08d2>] __alloc_pages_nodemask+0x494/0x6f7
 [<ffffffff810d9a07>] kmem_getpages+0x58/0x155
 [<ffffffff810dc002>] fallback_alloc+0x120/0x1f3
 [<ffffffff8106db23>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff810dbed3>] ____cache_alloc_node+0x177/0x186
 [<ffffffff81162a6c>] ? ext4_init_io_end+0x1c/0x37
 [<ffffffff810dc403>] kmem_cache_alloc+0xf1/0x176
 [<ffffffff810b17ac>] ? test_set_page_writeback+0x101/0x113
 [<ffffffff81162a6c>] ext4_init_io_end+0x1c/0x37
 [<ffffffff81162ce4>] ext4_bio_write_page+0x20f/0x3af
 [<ffffffff8115cc02>] mpage_da_submit_io+0x26e/0x2f6
 [<ffffffff811088e5>] ? __find_get_block_slow+0x38/0x133
 [<ffffffff81161348>] mpage_da_map_and_submit+0x3a7/0x3bd
 [<ffffffff81161a60>] ext4_da_writepages+0x30d/0x426
 [<ffffffff810b3359>] do_writepages+0x1c/0x2a
 [<ffffffff81102f4d>] __writeback_single_inode+0x3e/0xe5
 [<ffffffff81103995>] writeback_sb_inodes+0x1bd/0x2f4
 [<ffffffff81103b3b>] __writeback_inodes_wb+0x6f/0xb4
 [<ffffffff81103c81>] wb_writeback+0x101/0x195
 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
 [<ffffffff811043aa>] ? wb_do_writeback+0xaa/0x173
 [<ffffffff8110434a>] wb_do_writeback+0x4a/0x173
 [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81038554>] ? del_timer+0x4b/0x5b
 [<ffffffff811044e0>] bdi_writeback_thread+0x6d/0x147
 [<ffffffff81104473>] ? wb_do_writeback+0x173/0x173
 [<ffffffff81048fbc>] kthread+0xd0/0xd8
 [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
2 locks held by flush-8:0/24427:
 #0:  (&type->s_umount_key#41){.+.+..}, at: [<ffffffff810e3b73>] grab_super_passive+0x4c/0x76
 #1:  (jbd2_handle){+.+...}, at: [<ffffffff81190d81>] start_this_handle+0x475/0x4ea


The problem here is that another thread, which is attempting to write the
to-be-stored NFS page to the on-ext4 cache file is waiting for the journal
lock, eg:

INFO: task kworker/u:2:24437 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u:2     D ffff880039589768     0 24437      2 0x00000000
 ffff8800395896d8 0000000000000046 ffff8800283bf040 ffff880039589fd8
 0000000000011c80 ffff880039589fd8 ffff880039f0b040 ffff8800283bf040
 0000000000000006 ffff8800283bf6b8 ffff880039589658 ffffffff81071a13
Call Trace:
 [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea
 [<ffffffff81455e73>] ? _raw_spin_unlock_irqrestore+0x3a/0x50
 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170
 [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81454bed>] schedule+0x60/0x62
 [<ffffffff81190c23>] start_this_handle+0x317/0x4ea
 [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d
 [<ffffffff81190fcc>] jbd2__journal_start+0xb3/0x12e
 [<ffffffff81176606>] __ext4_journal_start_sb+0xb2/0xc6
 [<ffffffff8115f137>] ext4_da_write_begin+0x109/0x233
 [<ffffffff810a964d>] generic_file_buffered_write+0x11a/0x264
 [<ffffffff811032cf>] ? __mark_inode_dirty+0x2d/0x1ee
 [<ffffffff810ab1ab>] __generic_file_aio_write+0x2a5/0x2d5
 [<ffffffff810ab24a>] generic_file_aio_write+0x6f/0xd0
 [<ffffffff81159a2c>] ext4_file_write+0x38c/0x3c4
 [<ffffffff810e0915>] do_sync_write+0x91/0xd1
 [<ffffffffa00a17f0>] cachefiles_write_page+0x26f/0x310 [cachefiles]
 [<ffffffffa01d470b>] fscache_write_op+0x21e/0x37a [fscache]
 [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
 [<ffffffffa01d2479>] fscache_op_work_func+0x78/0xd7 [fscache]
 [<ffffffff8104455a>] process_one_work+0x232/0x3a8
 [<ffffffff810444ff>] ? process_one_work+0x1d7/0x3a8
 [<ffffffff81044ee0>] worker_thread+0x214/0x303
 [<ffffffff81044ccc>] ? manage_workers+0x245/0x245
 [<ffffffff81048fbc>] kthread+0xd0/0xd8
 [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e
 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55
4 locks held by kworker/u:2/24437:
 #0:  (fscache_operation){.+.+.+}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8
 #1:  ((&op->work)){+.+.+.}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8
 #2:  (sb_writers#14){.+.+.+}, at: [<ffffffff810ab22c>] generic_file_aio_write+0x51/0xd0
 #3:  (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [<ffffffff810ab236>] generic_file_aio_write+0x5b/0x

fscache already tries to cancel pending stores, but it can't cancel a write
for which I/O is already in progress.

An alternative would be to accept writing garbage to the cache under extreme
circumstances and to kill the afflicted cache object if we have to do this.
However, we really need to know how strapped the allocator is before deciding
to do that.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
Sebastian Andrzej Siewior ee8be57bc3 fs/fscache: remove spin_lock() from the condition in while()
The spinlock() within the condition in while() will cause a compile error
if it is not a function. This is not a problem on mainline but it does not
look pretty and there is no reason to do it that way.
That patch writes it a little differently and avoids the double condition.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19 14:16:47 +01:00
Anurup m ec686c9239 fs/fscache/stats.c: fix memory leak
There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release.  Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:27 -07:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
David Howells 91c7fbbf63 FS-Cache: Clear remaining page count on retrieval cancellation
Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.

Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below).  This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.

FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF     D      3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007f328>]  [<ffffffffa007f328>] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:ffff88001c6d7988  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff880014cdfe00 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88001c6d7998 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffe00
R13: ffff88001c6d7ab4 R14: ffff88001a8638a0 R15: ffff88001552b190
FS:  00007f877aaf0700(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff11378fd2 CR3: 000000001c6c6000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 6075, threadinfo ffff88001c6d6000, task ffff88001c6c4080)
Stack:
 ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
 ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
 ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
 [<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
 [<ffffffffa007db4d>] fscache_put_operation+0x135/0x2ed [fscache]
 [<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
 [<ffffffffa008116d>] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
 [<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
 [<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
 [<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
 [<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
 [<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
 [<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
 [<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
 [<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
 [<ffffffff810afe3e>] ra_submit+0x1c/0x20
 [<ffffffff810b019b>] ondemand_readahead+0x359/0x382
 [<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
 [<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
 [<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
 [<ffffffff810db5b3>] do_sync_read+0x91/0xd1
 [<ffffffff810dbb8b>] vfs_read+0x9b/0x144
 [<ffffffff810dbc78>] sys_read+0x44/0x75
 [<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:35:15 +00:00
David Howells 1f372dff1d FS-Cache: Mark cancellation of in-progress operation
Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:34:00 +00:00