Commit Graph

489 Commits

Author SHA1 Message Date
Lachlan McIlroy e5eb7f202b [XFS] use struct kvec in struct uio
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:27701a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:21 +11:00
David Chinner 03135cf726 [XFS] Fix UP build breakage due to undefined m_icsb_mutex.
SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27692a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:15 +11:00
David Chinner 20b642858b [XFS] Reduction global superblock lock contention near ENOSPC.
The existing per-cpu superblock counter code uses the global superblock
spin lock when we approach ENOSPC for global synchronisation. On larger
machines than this code was originally tested on this can still get
catastrophic spinlock contention due increasing rebalance frequency near
ENOSPC.

By introducing a sleeping lock that is used to serialise balances and
modifications near ENOSPC we prevent contention from needlessly from
wasting the CPU time of potentially hundreds of CPUs.

To reduce the number of balances occuring, we separate the need rebalance
case from the slow allocate case. Now, a counter running dry will trigger
a rebalance during which counters are disabled. Any thread that sees a
disabled counter enters a different path where it waits on the new mutex.
When it gets the new mutex, it checks if the counter is disabled. If the
counter is disabled, then we _know_ that we have to use the global counter
and lock and it is safe to do so immediately. Otherwise, we drop the mutex
and go back to trying the per-cpu counters which we know were re-enabled.

SGI-PV: 952227
SGI-Modid: xfs-linux-melb:xfs-kern:27612a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:09 +11:00
Eric Sandeen 804195b63a [XFS] Get rid of old 5.3/6.1 v1 log items. Cleanup patch sent in by Eric
Sandeen.

SGI-PV: 958736
SGI-Modid: xfs-linux-melb:xfs-kern:27596a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:35:02 +11:00
David Chinner 7989cb8ef5 [XFS] Keep stack usage down for 4k stacks by using noinline.
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.

Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.

Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.

SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:56 +11:00
David Chinner 5e6a07dfe4 [XFS] Current usage of buftarg flags is incorrect.
The {test,set,clear}_bit() operations take a bit index for the bit to
operate on. The XBT_* flags are defined as bit fields which is incorrect,
not to mention the way the bit fields are enumerated is broken too. This
was only working by chance.

Fix the definitions of the flags and make the code using them use the
{test,set,clear}_bit() operations correctly.

SGI-PV: 958639
SGI-Modid: xfs-linux-melb:xfs-kern:27565a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:49 +11:00
Lachlan McIlroy dc74eaad8c [XFS] Prevent buffer overrun in cmn_err().
The message buffer used by cmn_err() is only 256 bytes and some CXFS
messages were exceeding this length. Since we were using vsprintf() and
not checking for buffer overruns we were clobbering memory beyond the
buffer. The size of the buffer has been increased to 1024 bytes so we can
capture these larger messages and we are now using vsnprintf() to prevent
overrunning the buffer size.

SGI-PV: 958599
SGI-Modid: xfs-linux-melb:xfs-kern:27561a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Geoffrey Wehrman <gwehrman@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:34:38 +11:00
David Chinner 585e6d8856 [XFS] Fix a synchronous buftarg flush deadlock when freezing.
At the last stage of a freeze, we flush the buftarg synchronously over and
over again until it succeeds twice without skipping any buffers.

The delwri list flush skips pinned buffers, but tries to flush all others.
It removes the buffers from the delwri list, then tries to lock them one
at a time as it traverses the list to issue the I/O. It holds them locked
until we issue all of the I/O and then unlocks them once we've waited for
it to complete.

The problem is that during a freeze, the filesystem may still be doing
stuff - like flushing delalloc data buffers - in the background and hence
we can be trying to lock buffers that were on the delwri list at the same
time. Hence we can get ABBA deadlocks between threads doing allocation and
the buftarg flush (freeze) thread.

Fix it by skipping locked (and pinned) buffers as we traverse the delwri
buffer list.

SGI-PV: 957195
SGI-Modid: xfs-linux-melb:xfs-kern:27535a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:32:29 +11:00
David Chinner dac61f521b [XFS] Make quiet mounts quiet
The XFS quiet mount logic was inverted making quiet mounts noisy and vice
versa. Fix it.

SGI-PV: 958469
SGI-Modid: xfs-linux-melb:xfs-kern:27520a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-02-10 18:27:56 +11:00
David Chinner 921320210b [PATCH] Fix XFS after clear_page_dirty() removal
XFS appears to call clear_page_dirty to get the mapping tree dirty tag
set correctly at the same time the page dirty flag is cleared.  I note
that this can be done by set_page_writeback() if we clear the dirty flag
on the page first when we are writing back the entire page.

Hence it seems to me that the XFS call to clear_page_dirty() could
easily be substituted by clear_page_dirty_for_io() followed by a call to
set_page_writeback() to get the mapping tree tags set correctly after
the page has been marked clean.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-21 10:01:08 -08:00
Zach Brown 8459d86aff [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
going to call aio_complete().  It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio().  direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain.  It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio.  Instead we return the return
code of the operation and let the aio core call aio_complete().  This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it.  XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
 We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:21 -08:00
Josef "Jeff" Sipek e678fb0d52 [PATCH] xfs: change uses of f_{dentry,vfsmnt} to use f_path
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the xfs
filesystem.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:43 -08:00
Rafael J. Wysocki 58e14b148d [PATCH] Use freezeable workqueues in XFS
Make the workqueues used by XFS freezeable, so their worker threads don't
submit any I/O after the suspend image has been created.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:29 -08:00
Nigel Cunningham 7dfb71030f [PATCH] Add include/linux/freezer.h and move definitions from sched.h
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
David Howells c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
David Chinner e5ffd2bb62 [XFS] Stale the correct inode when freeing clusters.
SGI-PV: 958376
SGI-Modid: xfs-linux-melb:xfs-kern:27503a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-21 18:55:33 +11:00
Lachlan McIlroy d2133717d5 [XFS] Fix uninitialized br_state and br_startoff in
xfs_bmap_add_extent_delay_real()

SGI-PV: 957008
SGI-Modid: xfs-linux-melb:xfs-kern:27457a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Shailendra Tripathi <stripathi@agami.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-21 18:55:16 +11:00
David Chinner 050e714eb2 [XFS] Remove KERNEL_VERSION macros from xfs_dmapi.h
SGI-PV: 957005
SGI-Modid: xfs-linux-melb:xfs-kern:27398a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:05:06 +11:00
David Chinner 4c60658e0f [XFS] Prevent a deadlock when xfslogd unpins inodes.
The previous fixes for the use after free in xfs_iunpin left a nasty log
deadlock when xfslogd unpinned the inode and dropped the last reference to
the inode. the ->clear_inode() method can issue transactions, and if the
log was full, the transaction could push on the log and get stuck trying
to push the inode it was currently unpinning.

To fix this, we provide xfs_iunpin a guarantee that it will always have a
valid xfs_inode <-> linux inode link or a particular flag will be set on
the inode. We then use log forces during lookup to ensure transactions are
completed before we recycle the inode. This ensures that xfs_iunpin will
never use the linux inode after it is being freed, and any lookup on an
inode on the reclaim list will wait until it is safe to attach a new linux
inode to the xfs inode.

SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27359a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Shailendra Tripathi <stripathi@agami.com>
Signed-off-by: Takenori Nagano <t-nagano@ah.jp.nec.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:05:00 +11:00
David Chinner 7a18c38607 [XFS] Clean up i_flags and i_flags_lock handling.
SGI-PV: 956832
SGI-Modid: xfs-linux-melb:xfs-kern:27358a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nscott@aconex.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:04:54 +11:00
Vlad Apostolov 2e2e7bb1fd [XFS] 956664: dm_read_invis() changes i_atime
SGI-PV: 956664
SGI-Modid: xfs-linux-melb:xfs-kern:27315a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Sam Vaughan <sjv@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:04:47 +11:00
Vlad Apostolov 70a505285f [XFS] rename uio_read() to xfs_uio_read()
SGI-PV: 957004
SGI-Modid: xfs-linux-melb:xfs-kern:27231a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:04:41 +11:00
Tim Shimmin 439b843479 [XFS] Keep lockdep happy.
SGI-PV: 956964
SGI-Modid: xfs-linux-melb:xfs-kern:27200a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
2006-11-11 18:04:34 +11:00
Vlad Apostolov 93c189c114 [XFS] 956618: Linux crashes on boot with XFS-DMAPI filesystem when
CONFIG_XFS_TRACE is on

SGI-PV: 956618
SGI-Modid: xfs-linux-melb:xfs-kern:27196a

Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-11-11 18:03:49 +11:00
Andrew Morton 3fcfab16c5 [PATCH] separate bdi congestion functions from queue congestion functions
Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.

The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.

This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.

Cc: "Thomas Maier" <balagi@justmail.de>
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Howells <dhowells@redhat.com>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:35 -07:00