Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
- css_lookup(subsys, id)
returns pointer to struct cgroup_subysys_state of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
- css_id of root css for subsys
- id is automatically attached at creation of css.
- id is *not* freed automatically. Because the cgroup framework
don't know lifetime of cgroup_subsys_state.
free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.
- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)
Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.
- When subsys->use_id is set, # of css in the system is limited to
65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a problem in the IDR system, where an idr_remove_all() hands a data
element to call_rcu() (via free_layer()) before making that data element
inaccessible to new readers. This is very bad, and results in readers
still having a reference to this data element at the end of the grace
period.
Tests on large machines that concurrently map and unmap user-space memory
within the same multithreaded process result in crashes within about five
minutes. Applying this patch increases the kernel's longevity to the
three-to-eight-hour range.
There appear to be other similar problems in idr_get_empty_slot() and
sub_remove(), but I fixed the easy one in idr_remove_all() first. It is
therefore no surprise that failures still occur.
Located-by: Milton Miller II <miltonm@austin.ibm.com>
Tested-by: Milton Miller II <miltonm@austin.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David points out that the idr_remove_all() function returns unused slabs
to the kmem cache, but needs to zero them first or else they will be
uninitialized upon next use. This causes crashes which have been observed
in the firewire subsystem.
He fixed this by zeroing the object before freeing it in idr_remove_all().
But we agree that simply removing the constructor and zeroing the object
at allocation time is simpler than relying upon slab constructor machinery
and might even be faster.
This problem was introduced by "idr: make idr_remove rcu-safe" (commit
cf481c20c4), which was first released in
2.6.27.
There are no known codesites which trigger this bug in 2.6.27 or 2.6.28.
The post-2.6.28 firewire changes are the only known triggerer.
There might of course be not-yet-discovered triggerers in 2.6.27 and
2.6.28, and there might be out-of-tree triggerers which are added to those
kernel versions. I'll let the -stable guys decide whether they want to
backport this fix.
Reported-by: David Moore <dcm@acm.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kristian Hgsberg <krh@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.
Usually, nodes stay on the same layer. New layers are added to the top
of the tree.
The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards. p->layer
was not updated.
As usual: You shall never rely on the source code comments, they will
only mislead you.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.
This change also enables us to eliminate the check every time idr_init() is
called.
[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Remove all ids from the given idr tree. idr_destroy() only frees up
unused, cached idp_layers, but this function will remove all id mappings
and leave all idp_layers unused.
A typical clean-up sequence for objects stored in an idr tree, will use
idr_for_each() to free all objects, if necessay, then idr_remove_all() to
remove all ids, and idr_destroy() to free up the cached idr_layers.
Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds an iterator function for the idr data structure. Compared
to just iterating through the idr with an integer and idr_find, this
iterator is (almost, but not quite) linear in the number of elements, as
opposed to the number of integers in the range covered by the idr. This
makes a difference for sparse idrs, but more importantly, it's a nicer way
to iterate through the elements.
The drm subsystem is moving to idr for tracking contexts and drawables, and
with this change, we can use the idr exclusively for tracking these
resources.
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement idr based id allocator. ida is used the same way idr is
used but lacks id -> ptr translation and thus consumes much less
memory. struct ida_bitmap is attached as leaf nodes to idr tree which
is managed by the idr code. Each ida_bitmap is 128bytes long and
contains slightly less than a thousand slots.
ida is more aggressive with releasing extra resources acquired using
ida_pre_get(). After every successful id allocation, ida frees one
reserved idr_layer if possible. Reserved ida_bitmap is not freed
automatically but only one ida_bitmap is reserved and it's almost
always used right away. Under most circumstances, ida won't hold on
to memory for too long which isn't actively used.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Separate out idr_mark_full() from sub_alloc() and make marking the
allocated slot full the responsibility of idr_get_new_above_int().
Allocation part of idr_get_new_above_int() is renamed to
idr_get_empty_slot(). New idr_get_new_above_int() allocates a slot
using the function, install the user pointer and marks it full using
idr_mark_full().
This change doesn't introduce any behavior change. This will be
used by ida.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In sub_alloc(), when bitmap search fails, it goes up one level to
continue search. This is done by updating the id cursor and searching
the upper level again. If the cursor was at the end of the upper
level, we need to go further than that.
This wasn't implemented and when that happens the part of the cursor
which indexes into the upper level wraps and sub_alloc() ends up
searching the wrong bitmap. It allocates id which doesn't match the
actual slot.
This patch fixes this by restarting from the top if the search needs
to go higher than one level.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
source files, including:
* make multi-line initial descriptions single line
* denote some function names, constants and structs as such
* change erroneous opening '/*' to '/**' in a few places
* reword some text for clarity
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>