Commit Graph

72 Commits

Author SHA1 Message Date
Frederic Weisbecker 108d3943c0 reiserfs: Relax the lock before truncating pages
While truncating a file, reiserfs_setattr() calls inode_setattr()
that will truncate the mapping for the given inode, but for that
it needs the pages locks.

In order to release these, the owners need the reiserfs lock to
complete their jobs. But they can't, as we don't release it before
calling inode_setattr().

We need to do that to fix the following softlockups:

INFO: task flush-8:0:2149 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0     D f51af998     0  2149      2 0x00000000
 f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000
 f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40
 00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246
Call Trace:
 [<c1462604>] ? schedule+0x434/0xb20
 [<c102c55b>] ? resched_task+0x4b/0x70
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c146414d>] ? mutex_lock_nested+0x1fd/0x350
 [<c14640b9>] mutex_lock_nested+0x169/0x350
 [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40
 [<c1178cde>] reiserfs_write_lock+0x2e/0x40
 [<c11719a2>] do_journal_end+0xc2/0xe70
 [<c1172912>] journal_end+0xb2/0x120
 [<c11686b3>] ? pathrelse+0x33/0xb0
 [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70
 [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c1154b24>] reiserfs_writepage+0xa74/0xe80
 [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0
 [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10bc1e0>] __writepage+0x10/0x40
 [<c10bc9ab>] write_cache_pages+0x16b/0x320
 [<c10bc1d0>] ? __writepage+0x0/0x40
 [<c10bcb88>] generic_writepages+0x28/0x40
 [<c10bcbd5>] do_writepages+0x35/0x40
 [<c11059f7>] writeback_single_inode+0xc7/0x330
 [<c11067b2>] writeback_inodes_wb+0x2c2/0x490
 [<c1106a86>] wb_writeback+0x106/0x1b0
 [<c1106cf6>] wb_do_writeback+0x106/0x1e0
 [<c1106c18>] ? wb_do_writeback+0x28/0x1e0
 [<c1106e0a>] bdi_writeback_task+0x3a/0xb0
 [<c10cbb13>] bdi_start_fn+0x63/0xc0
 [<c10cbab0>] ? bdi_start_fn+0x0/0xc0
 [<c105d1f4>] kthread+0x74/0x80
 [<c105d180>] ? kthread+0x0/0x80
 [<c100327a>] kernel_thread_helper+0x6/0x10
3 locks held by flush-8:0/2149:
 #0:  (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490
 #1:  (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40
INFO: task fstest:3813 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fstest        D 00000002     0  3813   3812 0x00000000
 f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000
 00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40
 00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c
Call Trace:
 [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280
 [<c1462d64>] io_schedule+0x74/0xc0
 [<c10b5a45>] sync_page+0x35/0x60
 [<c146325a>] __wait_on_bit_lock+0x4a/0x90
 [<c10b5a10>] ? sync_page+0x0/0x60
 [<c10b59e5>] __lock_page+0x85/0x90
 [<c105d660>] ? wake_bit_function+0x0/0x60
 [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0
 [<c10bf75f>] truncate_inode_pages+0x1f/0x30
 [<c10bf7cf>] truncate_pagecache+0x5f/0xa0
 [<c10bf86a>] vmtruncate+0x5a/0x70
 [<c10fdb7d>] inode_setattr+0x5d/0x190
 [<c1150117>] reiserfs_setattr+0x1f7/0x2f0
 [<c1464569>] ? down_write+0x49/0x70
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e6f3d>] do_truncate+0x6d/0xa0
 [<c10f4ce2>] do_filp_open+0x9a2/0xcf0
 [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50
 [<c10fec50>] ? alloc_fd+0xe0/0x100
 [<c10e602d>] do_sys_open+0x6d/0x130
 [<c1002cfb>] ? sysenter_exit+0xf/0x16
 [<c10e615e>] sys_open+0x2e/0x40
 [<c1002ccc>] sysenter_do_call+0x12/0x32
3 locks held by fstest/3813:
 #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0
 #1:  (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-05 08:00:29 +01:00
Frederic Weisbecker 5fe1533fda reiserfs: Fix recursive lock on lchown
On chown, reiserfs will call reiserfs_setattr() to change the owner
of the given inode, but it may also recursively call
reiserfs_setattr() to propagate the owner change to the private xattr
files for this inode.

Hence, the reiserfs lock may be acquired twice which is not wanted
as reiserfs_setattr() calls journal_begin() that is going to try to
relax the lock in order to safely acquire the journal mutex.

Using reiserfs_write_lock_once() from reiserfs_setattr() solves
the problem.

This fixes the following warning, that precedes a lockdep report.

WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50()
Hardware name: MS-7418
Unwanted recursive reiserfs lock!
Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195
Call Trace:
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f7ac>] warn_slowpath_common+0x6c/0xc0
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f84b>] warn_slowpath_fmt+0x2b/0x30
 [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50
 [<c1172ae3>] do_journal_begin_r+0x83/0x350
 [<c1172f2d>] journal_begin+0x7d/0x140
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c115007d>] reiserfs_setattr+0x15d/0x2e0
 [<c10f9bf3>] ? dput+0xe3/0x140
 [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50
 [<c117831d>] chown_one_xattr+0xd/0x10
 [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0
 [<c1178310>] ? chown_one_xattr+0x0/0x10
 [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350
 [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c1150046>] reiserfs_setattr+0x126/0x2e0
 [<c1177c20>] ? reiserfs_getxattr+0x0/0x90
 [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e659f>] chown_common+0x6f/0x90
 [<c10e67bd>] sys_lchown+0x6d/0x80
 [<c1002ccc>] sysenter_do_call+0x12/0x32
---[ end trace 7c2b77224c1442fc ]---

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-05 07:59:38 +01:00
Frederic Weisbecker cb1c2e51c5 reiserfs: Fix reiserfs lock and journal lock inversion dependency
When we were using the bkl, we didn't care about dependencies against
other locks, but the mutex conversion created new ones, which is why
we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock
before acquiring another mutex.

But this trick actually fails if we have acquired the reiserfs lock
recursively, as we try to unlock it to acquire the new mutex without
inverted dependency, but we eventually only decrease its depth.

This happens in the case of a nested inode creation/deletion.
Say we have no space left on the device, we create an inode
and tak the lock but fail to create its entry, then we release the
inode using iput(), which calls reiserfs_delete_inode() that takes
the reiserfs lock recursively. The path eventually ends up in
journal_begin() where we try to take the journal safely but we
fail because of the reiserfs lock recursion:

[ INFO: possible circular locking dependency detected ]
2.6.32-06486-g053fe57 #2
-------------------------------------------------------
vi/23454 is trying to acquire lock:
 (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0

but task is already holding lock:
 (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c104f8f3>] validate_chain+0xa23/0xf70
       [<c1050325>] __lock_acquire+0x4e5/0xa70
       [<c105092a>] lock_acquire+0x7a/0xa0
       [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
       [<c11106a8>] reiserfs_write_lock+0x28/0x40
       [<c110dacb>] do_journal_begin_r+0x6b/0x2f0
       [<c110ddcf>] journal_begin+0x7f/0x120
       [<c10f76c2>] reiserfs_remount+0x212/0x4d0
       [<c1093997>] do_remount_sb+0x67/0x140
       [<c10a9ca6>] do_mount+0x436/0x6b0
       [<c10a9f86>] sys_mount+0x66/0xa0
       [<c1002c50>] sysenter_do_call+0x12/0x36

-> #0 (&journal->j_mutex){+.+...}:
       [<c104fe38>] validate_chain+0xf68/0xf70
       [<c1050325>] __lock_acquire+0x4e5/0xa70
       [<c105092a>] lock_acquire+0x7a/0xa0
       [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
       [<c110dac4>] do_journal_begin_r+0x64/0x2f0
       [<c110ddcf>] journal_begin+0x7f/0x120
       [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
       [<c10a55fc>] generic_delete_inode+0x9c/0x150
       [<c10a56ed>] generic_drop_inode+0x3d/0x60
       [<c10a4607>] iput+0x47/0x50
       [<c10e915c>] reiserfs_create+0x16c/0x1c0
       [<c109a9c1>] vfs_create+0xc1/0x130
       [<c109dbec>] do_filp_open+0x81c/0x920
       [<c109004f>] do_sys_open+0x4f/0x110
       [<c1090179>] sys_open+0x29/0x40
       [<c1002c50>] sysenter_do_call+0x12/0x36

other info that might help us debug this:

2 locks held by vi/23454:
 #0:  (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>]
do_filp_open+0x27e/0x920
 #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>]
reiserfs_write_lock+0x28/0x40

stack backtrace:
Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57 #2
Call Trace:
 [<c134b202>] ? printk+0x18/0x1e
 [<c104e960>] print_circular_bug+0xc0/0xd0
 [<c104fe38>] validate_chain+0xf68/0xf70
 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
 [<c1050325>] __lock_acquire+0x4e5/0xa70
 [<c105092a>] lock_acquire+0x7a/0xa0
 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
 [<c110ff80>] ? delete_one_xattr+0x0/0x1c0
 [<c110dac4>] do_journal_begin_r+0x64/0x2f0
 [<c110ddcf>] journal_begin+0x7f/0x120
 [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50
 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
 [<c10a55bf>] ? generic_delete_inode+0x5f/0x150
 [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140
 [<c10a55fc>] generic_delete_inode+0x9c/0x150
 [<c10a56ed>] generic_drop_inode+0x3d/0x60
 [<c10a4607>] iput+0x47/0x50
 [<c10e915c>] reiserfs_create+0x16c/0x1c0
 [<c1099a5d>] ? inode_permission+0x7d/0xa0
 [<c109a9c1>] vfs_create+0xc1/0x130
 [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0
 [<c109dbec>] do_filp_open+0x81c/0x920
 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
 [<c134dc0d>] ? _spin_unlock+0x1d/0x20
 [<c10a6eea>] ? alloc_fd+0xba/0xf0
 [<c109004f>] do_sys_open+0x4f/0x110
 [<c1090179>] sys_open+0x29/0x40
 [<c1002c50>] sysenter_do_call+0x12/0x36

To fix this, use reiserfs_lock_once() from reiserfs_delete_inode()
which prevents from adding reiserfs lock recursion.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-12-14 11:47:11 +01:00
Frederic Weisbecker 1d2c6cfd40 kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block()
GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
nobody can modify the tree in the middle of its work. Now that we
kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
here because we already hold the reiserfs lock.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-11-20 18:25:02 +01:00
Frederic Weisbecker 27b3a5c51b kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0()
We had a watchdog in _get_block_create_0() that jumped to a fixup retry
path in case the bkl got relaxed while calling kmap().
This is not necessary anymore since we now have a reiserfs lock that is
not implicitly relaxed while sleeping.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 23:34:31 +02:00
Frederic Weisbecker 7e94277050 kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write()
reiserfs_commit_write() is always called with the write lock held.
Thus the current calls to reiserfs_write_lock() in this function are
acquiring the lock recursively.
We can safely drop them.

This also solves further assumptions for this lock to be really
released while calling reiserfs_write_unlock().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
2009-09-14 07:18:29 +02:00
Frederic Weisbecker d6f5b0aa08 kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end()
reiserfs_write_end() is a hot path in reiserfs.
We have two wasteful write lock lock/release inside that can be gathered
without changing the code logic.

This patch factorizes them out in a single protected section, reducing the
number of contentions inside.

[ Impact: reduce lock contention in a reiserfs hotpath ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:20 +02:00
Frederic Weisbecker 26931309a4 kill-the-bkl/reiserfs: lock only once on reiserfs_get_block()
reiserfs_get_block() is one of these sites where the write lock might
be acquired recursively.

It's a particular problem because this function is called very often.
It's a hot spot which needs to reschedule() periodically while converting
direct items to indirect ones because it can take some time.

Then if we are applying the write lock release/reacquire pattern on
schedule() here, it may not produce the desired effect since we may have
locked in more than one depth.

The solution is to use reiserfs_write_lock_once() which won't try
to reacquire the lock recursively. Then the lock will be *really*
released before schedule().

Also, we only release the lock if TIF_NEED_RESCHED is set to not
create wasteful numerous contentions.

[ Impact: fix a too long holded lock case in reiserfs_get_block() ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-14 07:18:16 +02:00
Frederic Weisbecker 22c963addc kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file
Impact: fix a deadlock

reiserfs_truncate_file() can be called from multiple context where
the write lock can be already hold or not.

This function also acquire (possibly recursively) the write
lock. Subsequent releases before sleeping will not actually release
the lock because we may be in more than one lock depth degree.

A typical case is:

reiserfs_file_release {
	acquire_the_lock()
	reiserfs_truncate_file()
		reacquire_the_lock()
		journal_begin() {
			do_journal_begin_r() {
				reiserfs_wait_on_write_block() {
					/*
					 * Not released because still one
					 * depth owned
					 */
					release_lock()
					wait_for_event()

At this stage the event never happen because the one which provides
it needs the write lock.

We use reiserfs_write_lock_once() here to ensure that we don't acquire the
write lock recursively.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:18:03 +02:00
Frederic Weisbecker 8ebc423238 reiserfs: kill-the-BKL
This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

   http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
  of the code.
- We need to identify the sites where the bkl was implictly relaxed
  (schedule, wait, sync, etc...) so that we can in turn release and
  reacquire our new lock explicitly.
  Such implicit releases of the lock are often required to let other
  resources producer/consumer do their job or we can suffer unexpected
  starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

      T1                       T2

lock_kernel()
    mutex_lock(A)
    unlock_kernel()
    // do something
                            lock_kernel()
                                mutex_lock(A) -> already locked by T1
                                schedule() (and then unlock_kernel())
    lock_kernel()
    mutex_unlock(A)
    ....

This is not fine with a mutex:

      T1                       T2

mutex_lock(write)
    mutex_lock(A)
    mutex_unlock(write)
    // do something
                           mutex_lock(write)
                              mutex_lock(A) -> already locked by T1
                              schedule()

    mutex_lock(write) -> already locked by T2
    deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 07:17:59 +02:00
Al Viro 281eede032 switch reiserfs to inode->i_acl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:06 -04:00
Linus Torvalds e1c5024828 Merge branch 'reiserfs-updates' from Jeff Mahoney
* reiserfs-updates: (35 commits)
  reiserfs: rename [cn]_* variables
  reiserfs: rename p_._ variables
  reiserfs: rename p_s_tb to tb
  reiserfs: rename p_s_inode to inode
  reiserfs: rename p_s_bh to bh
  reiserfs: rename p_s_sb to sb
  reiserfs: strip trailing whitespace
  reiserfs: cleanup path functions
  reiserfs: factor out buffer_info initialization
  reiserfs: add atomic addition of selinux attributes during inode creation
  reiserfs: use generic readdir for operations across all xattrs
  reiserfs: journaled xattrs
  reiserfs: use generic xattr handlers
  reiserfs: remove i_has_xattr_dir
  reiserfs: make per-inode xattr locking more fine grained
  reiserfs: eliminate per-super xattr lock
  reiserfs: simplify xattr internal file lookups/opens
  reiserfs: Clean up xattrs when REISERFS_FS_XATTR is unset
  reiserfs: remove IS_PRIVATE helpers
  reiserfs: remove link detection code
  ...

Fixed up conflicts manually due to:
 - quota name cleanups vs variable naming changes:
	fs/reiserfs/inode.c
	fs/reiserfs/namei.c
	fs/reiserfs/stree.c
        fs/reiserfs/xattr.c
 - exported include header cleanups
	include/linux/reiserfs_fs.h
2009-03-30 12:33:01 -07:00
Jeff Mahoney 995c762ea4 reiserfs: rename p_s_inode to inode
This patch is a simple s/p_s_inode/inode/g to the reiserfs code.  This
is the third in a series of patches to rip out some of the awful
variable naming in reiserfs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:39 -07:00
Jeff Mahoney 0222e6571c reiserfs: strip trailing whitespace
This patch strips trailing whitespace from the reiserfs code.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:39 -07:00
Jeff Mahoney 57fe60df62 reiserfs: add atomic addition of selinux attributes during inode creation
Some time ago, some changes were made to make security inode attributes
be atomically written during inode creation.  ReiserFS fell behind in
this area, but with the reworking of the xattr code, it's now fairly
easy to add.

The following patch adds the ability for security attributes to be added
automatically during inode creation.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:39 -07:00
Jeff Mahoney 0ab2621ebd reiserfs: journaled xattrs
Deadlocks are possible in the xattr code between the journal lock and the
xattr sems.

This patch implements journalling for xattr operations. The benefit is
twofold:
 * It gets rid of the deadlock possibility by always ensuring that xattr
   write operations are initiated inside a transaction.
 * It corrects the problem where xattr backing files aren't considered any
   differently than normal files, despite the fact they are metadata.

I discussed the added journal load with Chris Mason, and we decided that
since xattrs (versus other journal activity) is fairly rare, the introduction
of larger transactions to support journaled xattrs wouldn't be too big a deal.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:38 -07:00
Jeff Mahoney d984561b32 reiserfs: eliminate per-super xattr lock
With the switch to using inode->i_mutex locking during lookups/creation
in the xattr root, the per-super xattr lock is no longer needed.

This patch removes it.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:38 -07:00
Jeff Mahoney 6dfede6963 reiserfs: remove IS_PRIVATE helpers
There are a number of helper functions for marking a reiserfs inode
private that were leftover from reiserfs did its own thing wrt to
private inodes.  S_PRIVATE has been in the kernel for some time, so this
patch removes the helpers and uses IS_PRIVATE instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:37 -07:00
Jeff Mahoney 0030b64570 reiserfs: use reiserfs_error()
This patch makes many paths that are currently using warnings to handle
the error.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:37 -07:00
Jeff Mahoney c3a9c2109f reiserfs: rework reiserfs_panic
ReiserFS panics can be somewhat inconsistent.
In some cases:
 * a unique identifier may be associated with it
 * the function name may be included
 * the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:36 -07:00
Jeff Mahoney 45b03d5e8e reiserfs: rework reiserfs_warning
ReiserFS warnings can be somewhat inconsistent.
In some cases:
 * a unique identifier may be associated with it
 * the function name may be included
 * the device may be printed separately

This patch aims to make warnings more consistent. reiserfs_warning() prints
the device name, so printing it a second time is not required. The function
name for a warning is always helpful in debugging, so it is now automatically
inserted into the output. Hans has stated that every warning should have
a unique identifier. Some cases lack them, others really shouldn't have them.
reiserfs_warning() now expects an id associated with each message. In the
rare case where one isn't needed, "" will suffice.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30 12:16:36 -07:00
Jan Kara 77db4f25bc reiserfs: Use lowercase names of quota functions
Use lowercase names of quota functions instead of old uppercase ones.

Signed-off-by: Jan Kara <jack@suse.cz>
CC: reiserfs-devel@vger.kernel.org
2009-03-26 02:18:36 +01:00
Linus Torvalds 520c853466 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  inotify: fix type errors in interfaces
  fix breakage in reiserfs_new_inode()
  fix the treatment of jfs special inodes
  vfs: remove duplicate code in get_fs_type()
  add a vfs_fsync helper
  sys_execve and sys_uselib do not call into fsnotify
  zero i_uid/i_gid on inode allocation
  inode->i_op is never NULL
  ntfs: don't NULL i_op
  isofs check for NULL ->i_op in root directory is dead code
  affs: do not zero ->i_op
  kill suid bit only for regular files
  vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05 18:32:06 -08:00
Al Viro 2f1169e2dc fix breakage in reiserfs_new_inode()
now that we use ih.key earlier, we need to do all its setup early enough

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:29 -05:00
Nick Piggin 54566b2c15 fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00