While compiling my code with -Wconversion using gcc-trunk, I always get a
bunch of warrning from headers, here is fix for them:
__getblk is alawys called with unsigned argument,
but it takes signed, the same story with __bread,__breadahead and so on.
Signed-off-by: Tomasz Kvarsin
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
bufferhead. Recently, I found the long standing mmap/unwritten extent
conversion bug, and it was to do with partial page invalidation not clearing
the unwritten flag from bufferheads attached to the page but beyond EOF. See
here for a full explaination:
http://oss.sgi.com/archives/xfs/2006-12/msg00196.html
The solution I have checked into the XFS dev tree involves duplicating code
from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
and then calling block_invalidatepage() to do the rest.
Christoph suggested that this would be better solved by pushing the unwritten
flag into the common buffer head flags and just adding the call to
discard_buffer():
http://oss.sgi.com/archives/xfs/2006-12/msg00239.html
The following patch makes BH_Unwritten a first class citizen.
Signed-off-by: Dave Chinner <dgc@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.
Mingming later sent the same patch, saying:
We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple threads loop untar/rm a kernel tree.
The race is caused by missing a memory barrier at unlock_buffer() before the
lock bit being cleared, resulting in possible concurrent h_refcounter update.
That causes a reference counter leak, then later leads to the double free that
we have seen.
Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
bit is being cleared, however, there is no memory barrier *before* the bit is
cleared. On some arch the h_refcount update instruction and the clear bit
instruction could be reordered, thus leave the critical section re-entered.
The race is like this: For example, if the h_refcount is initialized as 1,
cpu 0: cpu1
-------------------------------------- -----------------------------------
lock_buffer() /* test_and_set_bit */
clear_buffer_locked(bh);
lock_buffer() /* test_and_set_bit */
h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */
clear_buffer_locked(bh);
.... ......
We lost a h_refcount here. We need a memory barrier before the buffer head lock
bit being cleared to force the order of the two writes. Please apply.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().
Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix commit ecdfc9787f
Not to put too fine a point on it, but in a nutshell...
__set_page_dirty_buffers() | try_to_free_buffers()
---------------------------+---------------------------
| spin_lock(private_lock);
| drop_bufers()
| spin_unlock(private_lock);
spin_lock(private_lock) |
!page_has_buffers() |
spin_unlock(private_lock) |
SetPageDirty() |
| cancel_dirty_page()
oops!
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not pretty, but it appears that ext3 with data=journal will clean
pages without ever actually telling the VM that they are clean. This,
in turn, will result in the VM (and balance_dirty_pages() in particular)
to never realize that the pages got cleaned, and wait forever for an
event that already happened.
Technically, this seems to be a problem with ext3 itself, but it used to
be hidden by 'try_to_free_buffers()' noticing this situation on its own,
and just working around the filesystem problem.
This commit re-instates that hack, in order to avoid a regression for
the 2.6.20 release. This fixes bugzilla 7844:
http://bugzilla.kernel.org/show_bug.cgi?id=7844
Peter Zijlstra points out that we should probably retain the debugging
code that this removes from cancel_dirty_page(), and I agree, but for
the imminent release we might as well just silence the warning too
(since it's not a new bug: anything that triggers that warning has been
around forever).
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.
(XFS unlocks the semaphore from a different task, by design. The mutex
code warns about this)
Signed-off-by: Dave Chinner <dgc@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is preparatory work in our continuing saga on some hard-to-trigger
file corruption with shared writable mmap() after the dirty page
tracking changes (commit d08b3851da etc)
were merged.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Accounting writes is fairly simple: whenever a process flips a page from clean
to dirty, we accuse it of having caused a write to underlying storage of
PAGE_CACHE_SIZE bytes.
This may overestimate the amount of writing: the page-dirtying may cause only
one buffer_head's worth of writeout. Fixing that is possible, but probably a
bit messy and isn't obviously important.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When IO error happens on metadata buffer, buffer is freed from memory and
later fsync() is called, filesystems like ext2 fail to report EIO. We
solve the problem by introducing a pointer to associated address space into
the buffer_head. When a buffer is removed from a list of metadata buffers
associated with an address space, IO error is transferred from the buffer to
the address space, so that fsync can later report it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of flush_dcache_page()s are missing on the I/O-error paths.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If grow_buffers() is for some reason passed a block number which wants to lie
outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
will accidentally truncate `index' and will then instnatiate a page at the
wrong pagecache offset. This causes __getblk_slow() to go into an infinite
loop.
This can happen with corrupted disks, or with software errors elsewhere.
Detect that, and handle it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was triggered, but not the fault of, the dirty page accounting
patches. Suitable for -stable as well, after it goes upstream.
Unable to handle kernel NULL pointer dereference at virtual address 0000004c
EIP is at _spin_lock+0x12/0x66
Call Trace:
[<401766e7>] __set_page_dirty_buffers+0x15/0xc0
[<401401e7>] set_page_dirty+0x2c/0x51
[<40140db2>] set_page_dirty_balance+0xb/0x3b
[<40145d29>] __do_fault+0x1d8/0x279
[<40147059>] __handle_mm_fault+0x125/0x951
[<401133f1>] do_page_fault+0x440/0x59f
[<4034d0c1>] error_code+0x39/0x40
[<08048a33>] 0x8048a33
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some functions out of the buffering code that aren't strictly buffering
specific. This is a precursor to being able to disable the block layer.
(*) Moved some stuff out of fs/buffer.c:
(*) The file sync and general sync stuff moved to fs/sync.c.
(*) The superblock sync stuff moved to fs/super.c.
(*) do_invalidatepage() moved to mm/truncate.c.
(*) try_to_release_page() moved to mm/filemap.c.
(*) Moved some related declarations between header files:
(*) declarations for do_invalidatepage() and try_to_release_page() moved
to linux/mm.h.
(*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tracking of dirty pages in shared writeable mmap()s.
The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.
The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.
The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protection
Page from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.
mprotect() is taught about the new behaviour as well. However it overrides
the last condition.
Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.
Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.
[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can immediately bail from invalidate_bdev() if the blockdev has no
pagecache.
This solves the huge IPI storms which hald is causing on the big ia64
machines when it polls CDROM drives.
Acked-by: Jes Sorensen <jes@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes nr_dirty a per zone counter. Looping over all processors is
avoided during writeback state determination.
The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones. Someone more familiar with
NFS should probably review what I have done.
[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- add a proper prototype for the following global function:
- buffer_init()
- make the following needlessly global function static:
- end_buffer_async_write()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>