The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex. This fixes a case in which the
acquisition order was erroneously reversed. This issue was caught by
the following lockdep splat:
=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.5 #1
-------------------------------------------------------
jffs2_gcd_mtd6/299 is trying to acquire lock:
(&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
but task is already holding lock:
(&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
[<c008bec4>] validate_chain+0xe6c/0x10bc
[<c008c660>] __lock_acquire+0x54c/0xba4
[<c008d240>] lock_acquire+0xa4/0x114
[<c046780c>] _raw_spin_lock+0x3c/0x4c
[<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
[<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
[<c0071a68>] kthread+0x98/0xa0
[<c000f264>] kernel_thread_exit+0x0/0x8
-> #0 (&c->alloc_sem){+.+.+.}:
[<c008ad2c>] print_circular_bug+0x70/0x2c4
[<c008c08c>] validate_chain+0x1034/0x10bc
[<c008c660>] __lock_acquire+0x54c/0xba4
[<c008d240>] lock_acquire+0xa4/0x114
[<c0466628>] mutex_lock_nested+0x74/0x33c
[<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
[<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
[<c0071a68>] kthread+0x98/0xa0
[<c000f264>] kernel_thread_exit+0x0/0x8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&c->erase_completion_lock)->rlock);
lock(&c->alloc_sem);
lock(&(&c->erase_completion_lock)->rlock);
lock(&c->alloc_sem);
*** DEADLOCK ***
1 lock held by jffs2_gcd_mtd6/299:
#0: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890
stack backtrace:
[<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
[<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
[<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
[<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
[<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
[<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
[<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
[<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
[<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
[<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)
This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'.
Cc: stable@kernel.org [2.6.37+]
Signed-off-by: Josh Cartwright <joshc@linux.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use pr_fmt to prefix KBUILD_MODNAME to appropriate logging messages.
Remove now unnecessary internal prefixes from formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use the more current logging style.
Coalesce formats, align arguments.
Convert uses of embedded function names to %s, __func__.
A couple of long line checkpatch errors I don't care about exist.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
D1 and D2 macros are mostly uses to emit debugging messages.
Convert the logging uses of D1 & D2 to jffs2_dbg(level, fmt, ...)
to be a bit more consistent style with the rest of the kernel.
All jffs2_dbg output is now at KERN_DEBUG where some of
the previous uses were emitted at various KERN_<LEVEL>s.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
jffs2_garbage_collect_pass() would previously return -EAGAIN if it
couldn't find anything to garbage collect from, and there were blocks on
the erase_pending_list. If the blocks were actually in the process of
being erased, though, then they wouldn't be on that list. Check for
nr_erasing_blocks being non-zero instead.
Fix jffs2_reserve_space() to wait for the in-progress erases to
complete, when jffs2_garbage_collect_pass() returns -EAGAIN.
And fix jffs2_erase_succeeded() to actually wake up the erase_wait wq
that jffs2_reserve_space() is now using.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Ever since jffs2_garbage_collect_metadata() was first half-written in
February 2001, it's been broken on architectures where 'char' is signed.
When garbage collecting a symlink with target length above 127, the payload
length would end up negative, causing interesting and bad things to happen.
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
To support NFS export, we need to know the parent inode of directories.
Rather than growing the jffs2_inode_cache structure, share space with
the nlink field -- which was always set to 1 for directories anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We don't actually care about nlink; we only care whether the inode in
question is unlinked or not.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When _all_ the blocks were on the erase_pending_list, we could't find a
block to GC from but there was no _actually_ free space, and
jffs2_reserve_space() would get a little unhappy.
Handle this case by returning -EAGAIN from jffs2_garbage_collect_pass().
There are two callers of that function -- jffs2_flush_wbuf_gc(), which
will interpret it as an error and flush the writebuffer by other means,
and jffs2_reserve_space(), which we modify to respond to -EAGAIN with an
immediate call to jffs2_erase_pending_blocks() and another run round the
loop.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
fs/jffs2/gc.c:1147:29: warning: symbol 'jeb' shadows an earlier one
fs/jffs2/gc.c:1084:89: originally declared here
fs/jffs2/gc.c:1197:29: warning: symbol 'jeb' shadows an earlier one
fs/jffs2/gc.c:1084:89: originally declared here
Rename the unused 'jeb' argument to avoid this. We could potentially
remove the argument, but GCC should be doing that anyway.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In three places: summary scan, normal scan, REF_PRISTINE GC.
Just truncate at the NUL, since that was the correct thing to do in the
only case where this (inexplicable) breakage has been seen.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In OLPC trac #4184 we found a case where a corrupted node didn't
actually get obsoleted when we tried to garbage-collect it. So we wrote
out many million copies of it, in repeated attempts to obsolete it,
until the flash became full. Don't Do That.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In particular, remove the bit in the LICENCE file about contacting
Red Hat for alternative arrangements. Their errant IS department broke
that arrangement a long time ago -- the policy of collecting copyright
assignments from contributors came to an end when the plug was pulled on
the servers hosting the project, without notice or reason.
We do still dual-license it for use with eCos, with the GPL+exception
licence approved by the FSF as being GPL-compatible. It's just that nobody
has the right to license it differently.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We should never find the unchecked size is non-zero after we've finished
checking all inodes. If it happens, used to BUG(), leaving the alloc_sem
held and deadlocking. Instead, just return -ENOSPC after complaining. The
GC thread will die, but read-only operation should be able to continue and
the file system should be unmountable.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We observe soft lockups when doing heavy test which creates
directory with a lot of direntries and deletes them. This
cycle is the reason fo this. Make it nicer and add cond_resched()
inside of it.
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
If xattr_ref is associated with an orphan inode_cache
on filesystem mounting, those xattr_refs are not
released even if this inode_cache is released.
This patch enables to call jffs2_xattr_delete_inode()
for such a irregular inode_cachde too.
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
- When xdatum is removed, a new xdatum with 'delete marker' is
written. (version==0xffffffff means 'delete marker')
- When xref is removed, a new xref with 'delete marker' is written.
(odd-numbered xseqno means 'delete marker')
- delete_xattr_(datum/xref)_delay() are new deletion functions
are added. We can only use them if we can detect the target
obsolete xdatum/xref as a orphan or errir one.
(e.g when inode deletion, or detecting crc error)
[1/3] jffs2-xattr-v6-01-delete_marker.patch
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
As the first step towards eliminating the ref->next_phys member and saving
memory by using an _array_ of struct jffs2_raw_node_ref per eraseblock,
stop the write functions from allocating their own refs; have them just
_reserve_ the appropriate number instead. Then jffs2_link_node_ref() can
just fill them in.
Use a linked list of pre-allocated refs in the superblock, for now. Once
we switch to an array, it'll just be a case of extending that array.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>