Nothing in the tree uses nopage any more. Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch streamlines the quota checking in the "no quota" case by
making the check inline in the calling function, thus reducing the
number of function calls. Eventually we might be able to remove the
checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
currently we can't as there are a very few places in the code which need
to call these functions directly still.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
This patch combines the 2 patches in bug 434736 to correct the lock
ordering in the unstuffing of the quota inode in gfs2_adjust_quota and
adjusting the number of revokes in gfs2_write_jdata_pagevec
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_alloc_get may fail so we have to check it to prevent
NULL pointer dereference.
Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We've previously been using a "try lock" in readpage on the basis that
it would prevent deadlocks due to the inverted lock ordering (our normal
lock ordering is glock first and then page lock). Unfortunately tests
have shown that this isn't enough. If the glock has a demote request
queued such that run_queue() in the glock code tries to do a demote when
its called under readpage then it will try and write out all the dirty
pages which requires locking them. This then deadlocks with the page
locked by readpage.
The solution is to always require two calls into readpage. The first
unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
second does the actual readpage and unlocks the glock & page as
required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial "ret" variable from gfs2_writepage_common.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Simplify page cache zeroing of segments of pages through 3 functions
zero_user_segments(page, start1, end1, start2, end2)
Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.
zero_user_segment(page, start, end)
Same for a single segment.
zero_user(page, start, length)
Length variant for the case where we know the length.
We remove the zero_user_page macro. Issues:
1. Its a macro. Inline functions are preferable.
2. The KM_USER0 macro is only defined for HIGHMEM.
Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.
Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.
Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.
Also extract the flushing of the caches to be outside of the kmap.
[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.
In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes a couple of problems which affected the execution of files
on GFS2. The first is that there was a corner case where inodes were not
always uptodate at the point at which permissions checks were being carried
out, this was resulting in refusal of execute permission, but only on the
first lookup, subsequent requests worked correctly. The second was a problem
relating to incorrect updating of file sizes which was introduced with the
write_begin/end code for GFS2 a little while back.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
To improve performance on NUMA, we use the VM's standard page
migration for writeback and ordered pages. Probably we could
also do the same for journaled data, but that would need a
careful audit of the code, so will be the subject of a later
patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch is just a cleanup. Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters. By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Something changed in the upstream kernel, and it needs this
one-liner to allow ops_address.c to build.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is an addendum to the new AOPs work which moves the point
at which we take the page lock so that we don't get it until
the last possible moment. This resolves a conflict between
starting transactions and the page lock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch resolves a lock ordering issue where we had been getting
a transaction lock in the wrong order with respect to the page lock.
By using writepages rather than just writepage, it is then possible
to start a transaction before locking the page, and thus matching the
locking order elsewhere in the code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch splits gfs2_writepage into separate functions for each of
the three cases: writeback, ordered and journalled. As a result
it becomes a lot easier to see what each one is doing. The common
code is moved into gfs2_writepage_common.
This fixes a performance bug where we were doing more work than
strictly required in the ordered write case.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Just like ext3 we now have three sets of address space operations
to cover the cases of writeback, ordered and journalled data
writes. This means that the individual operations can now become
less complicated as we are able to remove some of the tests for
file data mode from the code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a function "gfs2_is_writeback()" along the lines of the
existing "gfs2_is_jdata()" in order to clean up the code and make
the various tests for the inode mode more obvious. It also fixes
the PageChecked() logic where we were resetting the flag too early
in the case of an error path.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The i_cache was designed to keep references to the indirect blocks
used during block mapping so that they didn't have to be looked
up continually. The idea failed because there are too many places
where the i_cache needs to be freed, and this has in the past been
the cause of many bugs.
In addition there was no performance benefit being gained since the
disk blocks in question were cached anyway. So this patch removes
it in order to simplify the code to prepare for other changes which
would otherwise have had to add further support for this feature.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This cleans up the mmap() code path for GFS2 by implementing the
page_mkwrite function for GFS2. We are thus able to use the
generic filemap_fault function for our ->fault() implementation.
This now means that shared writable mappings will be much more
efficiently shared across the cluster if there is a reasonable
proportion of read activity (the greater proportion, the better).
As a side effect, it also reduces the size of the code, removes
special cases from readpage and readpages, and makes the code
path easier to follow.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As requested by Christoph, this patch cleans up GFS2's internal
read function so that it no longer uses the do_generic_mapping_read
function. This function is obsolete and GFS2 is the last user of it.
As a side effect the internal read code gets smaller and easier
to read and gfs2_readpage is split into two. One function has the locking
and the other function has the rest of the logic.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
* GFS2 has been using i_cache array to store its indirect meta blocks.
Its flush routine doesn't correctly clean up all the entries. The
problem would show while multiple nodes do simultaneous writes to the
same file. Upon glock exclusive lock transfer, if the file is a sparse
file with large file size where the indirect meta blocks span multiple
array entries with "zero" entries in between. The flush routine
prematurely stops the flushing that leaves old (stale) entries around.
This leads to several nasty issues, including data corruption.
* Fix gfs2_get_block_noalloc checking to correctly return EIO upon
unmapped buffer.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch cleans up the code for writing journaled data into the log.
It also removes the need to allocate a small "tag" structure for each
block written into the log. Instead we just keep count of the outstanding
I/O so that we can be sure that its all been written at the correct time.
Another result of this patch is that a number of ll_rw_block() calls
have become submit_bh() calls, closing some races at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The following patch removes the ordered write processing from
databuf_lo_before_commit() and moves it to log.c. This has the effect of
greatly simplyfying databuf_lo_before_commit() and well as potentially
making the ordered write code more efficient.
As a side effect of this, its now possible to remove ordered buffers
from the ordered buffer list at any time, so we now make use of this in
invalidatepage and releasepage to ensure timely release of these
buffers.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>