Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
"Fix up files in fs/ and lib/ dirs to only use module.h if they really
need it.
These are trivial in scope vs the work done previously. We now have
things where any few remaining cleanups can be farmed out to arch or
subsystem maintainers, and I have done so when possible. What is
remaining here represents the bits that don't clearly lie within a
single arch/subsystem boundary, like the fs dir and the lib dir.
Some duplicate includes arising from overlapping fixes from
independent subsystem maintainer submissions are also quashed."
Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).
* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
lib: reduce the use of module.h wherever possible
fs: reduce the use of module.h wherever possible
includecheck: delete any duplicate instances of module.h
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
It's often convenient to be able to release resource from IRQ context.
Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ
safe.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current hyper-optimized functions are overkill if you simply want to
allocate an id for a device. Create versions which use an internal
lock.
In followup patches, numerous drivers are converted to use this
interface.
Thanks to Tejun for feedback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Despite the idr_pre_get() kernel-doc, there are some cases where you can
call idr_pre_get() from within locked regions. Add a description for such
cases.
See also: http://lkml.org/lkml/2010/9/16/462
[akpm@linux-foundation.org: cleanups, grammatical fixes]
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was unclear in original kernel-doc how nextidp worked in
idr_get_next(). Let's describe it.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Fix the following kernel-doc warnings.
% perl scripts/kernel-doc lib/idr.c > /dev/null
Warning(lib/idr.c:300): No description found for parameter 'starting_id'
Warning(lib/idr.c:300): Excess function parameter 'start_id' description in 'idr_get_new_above'
Warning(lib/idr.c:485): No description found for parameter 'idp'
Warning(lib/idr.c:596): No description found for parameter 'nextidp'
Warning(lib/idr.c:596): Excess function parameter 'id' description in 'idr_get_next'
Warning(lib/idr.c:774): No description found for parameter 'starting_id'
Warning(lib/idr.c:774): Excess function parameter 'staring_id' description in 'ida_get_new_above'
Warning(lib/idr.c:918): No description found for parameter 'ida'
Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Currently idr_remove_all will fail with a use after free error if
idr::layers is bigger than 2, which on 32 bit systems corresponds to items
more than 1024. This is due to stepping back too many levels during
backtracking. For simplicity let's assume that IDR_BITS=1 -> we have 2
nodes at each level below the root node and each leaf node stores two IDs.
(In reality for 32 bit systems IDR_BITS=5, with 32 nodes at each sub-root
level and 32 IDs in each leaf node). The sequence of freeing the nodes at
the moment is as follows:
layer
1 -> a(7)
2 -> b(3) c(5)
3 -> d(1) e(2) f(4) g(6)
Until step 4 things go fine, but then node c is freed, whereas node g
should be freed first. Since node c contains the pointer to node g we'll
have a use after free error at step 6.
How many levels we step back after visiting the leaf nodes is currently
determined by the msb of the id we are currently visiting:
Step
1. node d with IDs 0,1 is freed, current ID is advanced to 2.
msb of the current ID bit 1. This means we need to step back
1 level to node b and take the next sibling, node e.
2-3. node e with IDs 2,3 is freed, current ID is 4, msb is bit 2.
This means we need to step back 2 levels to node a, freeing
node b on the way.
4-5. node f with IDs 4,5 is freed, current ID is 6, msb is still
bit 2. This means we again need to step back 2 levels to node
a and free c on the way.
6. We should visit node g, but its pointer is not available as
node c was freed.
The fix changes how we determine the number of levels to step back.
Instead of deducting this merely from the msb of the current ID, we should
really check if advancing the ID causes an overflow to a bit position
corresponding to a given layer. In the above example overflow from bit 0
to bit 1 should mean stepping back 1 level. Overflow from bit 1 to bit 2
should mean stepping back 2 levels and so on.
The fix was tested with IDs up to 1 << 20, which corresponds to 4 layers
on 32 bit systems.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.34.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is retry of reverted 859ddf0974
("idr: fix a critical misallocation bug") which contained two bugs.
* pa[idp->layers] should be cleared even if it's not used by
sub_alloc() because it's used by mark idr_mark_full().
* The original condition check also assigned pa[l] to p which the new
code didn't do thus leaving p pointing at the wrong layer.
Both problems have been fixed and the idr code has received good amount
testing using userland testing setup where simple bitmap allocator is
run parallel to verify the result of idr allocation.
The bug this patch fixes is caused by sub_alloc() optimization path
bypassing out-of-room condition check and restarting allocation loop
with starting value higher than maximum allowed value. For detailed
description, please read commit message of 859ddf09.
Signed-off-by: Tejun Heo <tj@kernel.org>
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 859ddf0974 tried to fix
misallocation bug but broke full bit marking by not clearing
pa[idp->layers] and also is causing X failures due to lookup failure
in drm code. The cause of the latter hasn't been found yet. Revert
the fix for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris located a bug in idr. With IDR_BITS of 6, it grows to three
layers when id 4096 is first allocated. When that happens, idr wraps
incorrectly and searches the idr array ignoring the high bits. The
following test code from Eric demonstrates the bug nicely.
#include <linux/idr.h>
#include <linux/kernel.h>
#include <linux/module.h>
static DEFINE_IDR(test_idr);
int init_module(void)
{
int ret, forty95, forty96;
void *addr;
/* add 2 entries both with 4095 as the start address */
again1:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4095, 4095, &forty95);
if (ret) {
if (ret == -EAGAIN)
goto again1;
return ret;
}
if (forty95 != 4095)
printk(KERN_ERR "hmmm, forty95=%d\n", forty95);
again2:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4096, 4095, &forty96);
if (ret) {
if (ret == -EAGAIN)
goto again2;
return ret;
}
if (forty96 != 4096)
printk(KERN_ERR "hmmm, forty96=%d\n", forty96);
/* try to find the 2 entries, noticing that 4096 broke */
addr = idr_find(&test_idr, forty95);
if ((int)addr != forty95)
printk(KERN_ERR "hmmm, after find forty95=%d addr=%d\n", forty95, (int)addr);
addr = idr_find(&test_idr, forty96);
if ((int)addr != forty96)
printk(KERN_ERR "hmmm, after find forty96=%d addr=%d\n", forty96, (int)addr);
/* really weird, the entry which should be at 4096 is actually at 0!! */
addr = idr_find(&test_idr, 0);
if ((int)addr)
printk(KERN_ERR "found an entry at id=0 for addr=%d\n", (int)addr);
idr_remove(&test_idr, forty95);
idr_remove(&test_idr, forty96);
return 0;
}
void cleanup_module(void)
{
}
MODULE_AUTHOR("Eric Paris <eparis@redhat.com>");
MODULE_DESCRIPTION("Simple idr test");
MODULE_LICENSE("GPL");
This happens because when sub_alloc() back tracks it doesn't always do it
step-by-step while the over-the-limit detection assumes step-by-step
backtracking. The logic in sub_alloc() looks like the following.
restart:
clear pa[top level + 1] for end cond detection
l = top level
while (true) {
search for empty slot at this level
if (not found) {
push id to the next possible value
l++
A: if (pa[l] is clear)
failed, return asking caller to grow the tree
if (going up 1 level gives more slots to search)
continue the while loop above with the incremented l
else
C: goto restart
}
adjust id accordingly to the found slot
if (l == 0)
return found id;
create lower level if not there yet
record pa[l] and l--
}
Test A is the fail exit condition but this assumes that failure is
propagated upwared one level at a time but the B optimization path breaks
the assumption and restarts the whole thing with a start value which is
above the possible limit with the current layers. sub_alloc() assumes the
start id value is inside the limit when called and test A is the only exit
condition check, so it ends up searching for empty slot while ignoring
high set bit.
So, for 4095->4096 test, level0 search fails but pa[1] contains a valid
pointer. However, going up 1 level wouldn't give any more empty slot so
it takes C and when the whole thing restarts nobody notices the high bit
set beyond the top level.
This patch fixes the bug by changing the fail exit condition check to full
id limit check.
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix some typos and punctuation in comments
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
- css_lookup(subsys, id)
returns pointer to struct cgroup_subysys_state of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
- css_id of root css for subsys
- id is automatically attached at creation of css.
- id is *not* freed automatically. Because the cgroup framework
don't know lifetime of cgroup_subsys_state.
free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.
- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)
Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.
- When subsys->use_id is set, # of css in the system is limited to
65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a problem in the IDR system, where an idr_remove_all() hands a data
element to call_rcu() (via free_layer()) before making that data element
inaccessible to new readers. This is very bad, and results in readers
still having a reference to this data element at the end of the grace
period.
Tests on large machines that concurrently map and unmap user-space memory
within the same multithreaded process result in crashes within about five
minutes. Applying this patch increases the kernel's longevity to the
three-to-eight-hour range.
There appear to be other similar problems in idr_get_empty_slot() and
sub_remove(), but I fixed the easy one in idr_remove_all() first. It is
therefore no surprise that failures still occur.
Located-by: Milton Miller II <miltonm@austin.ibm.com>
Tested-by: Milton Miller II <miltonm@austin.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David points out that the idr_remove_all() function returns unused slabs
to the kmem cache, but needs to zero them first or else they will be
uninitialized upon next use. This causes crashes which have been observed
in the firewire subsystem.
He fixed this by zeroing the object before freeing it in idr_remove_all().
But we agree that simply removing the constructor and zeroing the object
at allocation time is simpler than relying upon slab constructor machinery
and might even be faster.
This problem was introduced by "idr: make idr_remove rcu-safe" (commit
cf481c20c4), which was first released in
2.6.27.
There are no known codesites which trigger this bug in 2.6.27 or 2.6.28.
The post-2.6.28 firewire changes are the only known triggerer.
There might of course be not-yet-discovered triggerers in 2.6.27 and
2.6.28, and there might be out-of-tree triggerers which are added to those
kernel versions. I'll let the -stable guys decide whether they want to
backport this fix.
Reported-by: David Moore <dcm@acm.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kristian Hgsberg <krh@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>