Commit Graph

115 Commits

Author SHA1 Message Date
Vladimir Davydov 03afc0e25f slab: get_online_mems for kmem_cache_{create,destroy,shrink}
When we create a sl[au]b cache, we allocate kmem_cache_node structures
for each online NUMA node.  To handle nodes taken online/offline, we
register memory hotplug notifier and allocate/free kmem_cache_node
corresponding to the node that changes its state for each kmem cache.

To synchronize between the two paths we hold the slab_mutex during both
the cache creationg/destruction path and while tuning per-node parts of
kmem caches in memory hotplug handler, but that's not quite right,
because it does not guarantee that a newly created cache will have all
kmem_cache_nodes initialized in case it races with memory hotplug.  For
instance, in case of slub:

    CPU0                            CPU1
    ----                            ----
    kmem_cache_create:              online_pages:
     __kmem_cache_create:            slab_memory_callback:
                                      slab_mem_going_online_callback:
                                       lock slab_mutex
                                       for each slab_caches list entry
                                           allocate kmem_cache node
                                       unlock slab_mutex
      lock slab_mutex
      init_kmem_cache_nodes:
       for_each_node_state(node, N_NORMAL_MEMORY)
           allocate kmem_cache node
      add kmem_cache to slab_caches list
      unlock slab_mutex
                                    online_pages (continued):
                                     node_states_set_node

As a result we'll get a kmem cache with not all kmem_cache_nodes
allocated.

To avoid issues like that we should hold get/put_online_mems() during
the whole kmem cache creation/destruction/shrink paths, just like we
deal with cpu hotplug.  This patch does the trick.

Note, that after it's applied, there is no need in taking the slab_mutex
for kmem_cache_shrink any more, so it is removed from there.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:59 -07:00
Dave Hansen 34bf6ef94a mm: slab/slub: use page->list consistently instead of page->lru
'struct page' has two list_head fields: 'lru' and 'list'.  Conveniently,
they are unioned together.  This means that code can use them
interchangably, which gets horribly confusing like with this nugget from
slab.c:

>	list_del(&page->lru);
>	if (page->active == cachep->num)
>		list_add(&page->list, &n->slabs_full);

This patch makes the slab and slub code use page->lru universally instead
of mixing ->list and ->lru.

So, the new rule is: page->lru is what the you use if you want to keep
your page on a list.  Don't like the fact that it's not called ->list?
Too bad.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2014-04-11 10:06:06 +03:00
Christoph Lameter f1b6eb6e6b mm/sl[aou]b: Move kmallocXXX functions to common code
The kmalloc* functions of all slab allcoators are similar now so
lets move them into slab.h. This requires some function naming changes
in slob.

As a results of this patch there is a common set of functions for
all allocators. Also means that kmalloc_large() is now available
in general to perform large order allocations that go directly
via the page allocator. kmalloc_large() can be substituted if
kmalloc() throws warnings because of too large allocations.

kmalloc_large() has exactly the same semantics as kmalloc but
can only used for allocations > PAGE_SIZE.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-09-04 20:51:33 +03:00
Linus Torvalds 54be820019 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab update from Pekka Enberg:
 "Highlights:

  - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

  - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

  - Fix for excessive slab freelist draining (Wanpeng Li)

  - SLUB and SLOB cleanups and fixes (various people)"

I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
  slub: Check for page NULL before doing the node_match check
  mm/slab: Give s_next and s_stop slab-specific names
  slob: Check for NULL pointer before calling ctor()
  slub: Make cpu partial slab support configurable
  slab: add kmalloc() to kernel API documentation
  slab: fix init_lock_keys
  slob: use DIV_ROUND_UP where possible
  slub: do not put a slab to cpu partial list when cpu_partial is 0
  mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
  mm/slub: Drop unnecessary nr_partials
  mm/slab: Fix /proc/slabinfo unwriteable for slab
  mm/slab: Sharing s_next and s_stop between slab and slub
  mm/slab: Fix drain freelist excessively
  slob: Rework #ifdeffery in slab.h
  mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-14 15:14:29 -07:00
Steven Rostedt c1e854e924 slob: Check for NULL pointer before calling ctor()
While doing some code inspection, I noticed that the slob constructor
method can be called with a NULL pointer. If memory is tight and slob
fails to allocate with slob_alloc() or slob_new_pages() it still calls
the ctor() method with a NULL pointer. Looking at the first ctor()
method I found, I noticed that it can not handle a NULL pointer (I'm
sure others probably can't either):

static void sighand_ctor(void *data)
{
        struct sighand_struct *sighand = data;

        spin_lock_init(&sighand->siglock);
        init_waitqueue_head(&sighand->signalfd_wqh);
}

The solution is to only call the ctor() method if allocation succeeded.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 19:19:23 +03:00
Sasha Levin a6d78159f8 slob: use DIV_ROUND_UP where possible
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 18:50:05 +03:00
Mel Gorman 22b751c3d0 mm: rename page struct field helpers
The function names page_xchg_last_nid(), page_last_nid() and
reset_page_last_nid() were judged to be inconsistent so rename them to a
struct_field_op style pattern.  As it looked jarring to have
reset_page_mapcount() and page_nid_reset_last() beside each other in
memmap_init_zone(), this patch also renames reset_page_mapcount() to
page_mapcount_reset().  There are others like init_page_count() but as
it is used throughout the arch code a rename would likely cause more
conflicts than it is worth.

[akpm@linux-foundation.org: fix zcache]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:18 -08:00
Glauber Costa b9ce5ef49f sl[au]b: always get the cache from its page in kmem_cache_free()
struct page already has this information.  If we start chaining caches,
this information will always be more trustworthy than whatever is passed
into the function.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 15:02:14 -08:00
Christoph Lameter 4590685546 mm/sl[aou]b: Common alignment code
Extract the code to do object alignment from the allocators.
Do the alignment calculations in slab_common so that the
__kmem_cache_create functions of the allocators do not have
to deal with alignment.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-12-11 12:14:28 +02:00
Arnd Bergmann 789306e5ad mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN
The definition of ARCH_SLAB_MINALIGN is architecture dependent
and can be either of type size_t or int. Comparing that value
with ARCH_KMALLOC_MINALIGN can cause harmless warnings on
platforms where they are different. Since both are always
small positive integer numbers, using the size_t type to compare
them is safe and gets rid of the warning.

Without this patch, building ARM collie_defconfig results in:

mm/slob.c: In function '__kmalloc_node':
mm/slob.c:431:152: warning: comparison of distinct pointer types lacks a cast [enabled by default]
mm/slob.c: In function 'kfree':
mm/slob.c:484:153: warning: comparison of distinct pointer types lacks a cast [enabled by default]
mm/slob.c: In function 'ksize':
mm/slob.c:503:153: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ penberg@kernel.org: updates for master ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31 09:24:22 +02:00
Ezequiel Garcia 8cf9864b13 mm/slob: Use free_page instead of put_page for page-size kmalloc allocations
When freeing objects, the slob allocator currently free empty pages
calling __free_pages(). However, page-size kmallocs are disposed
using put_page() instead.

It makes no sense to call put_page() for kernel pages that are provided
by the object allocator, so we shouldn't be doing this ourselves.

This is based on:
commit d9b7f22623
Author: Glauber Costa <glommer@parallels.com>
slub: use free_page instead of put_page for freeing kmalloc allocation

Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31 08:53:54 +02:00
Ezequiel Garcia 242860a47a mm/sl[aou]b: Move common kmem_cache_size() to slab.h
This function is identically defined in all three allocators
and it's trivial to move it to slab.h

Since now it's static, inline, header-defined function
this patch also drops the EXPORT_SYMBOL tag.

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31 08:52:15 +02:00
Ezequiel Garcia fe74fe2bf2 mm/slob: Use object_size field in kmem_cache_size()
Fields object_size and size are not the same: the latter might include
slab metadata. Return object_size field in kmem_cache_size().
Also, improve trace accuracy by correctly tracing reported size.

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31 08:51:35 +02:00
Ezequiel Garcia 999d8795d4 mm/slob: Drop usage of page->private for storing page-sized allocations
This field was being used to store size allocation so it could be
retrieved by ksize(). However, it is a bad practice to not mark a page
as a slab page and then use fields for special purposes.
There is no need to store the allocated size and
ksize() can simply return PAGE_SIZE << compound_order(page).

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31 08:50:43 +02:00
Pekka Enberg e2087be35a Merge branch 'slab/tracing' into slab/for-linus 2012-10-03 09:57:17 +03:00
Pekka Enberg f4178cdddd Merge branch 'slab/common-for-cgroups' into slab/for-linus
Fix up a trivial conflict with NUMA_NO_NODE cleanups.

Conflicts:
	mm/slob.c

Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-03 09:56:37 +03:00
David Rientjes 82bd5508b4 mm, slob: fix build breakage in __kmalloc_node_track_caller
On Sat, 8 Sep 2012, Ezequiel Garcia wrote:

> @@ -454,15 +455,35 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
>  			gfp |= __GFP_COMP;
>  		ret = slob_new_pages(gfp, order, node);
>
> -		trace_kmalloc_node(_RET_IP_, ret,
> +		trace_kmalloc_node(caller, ret,
>  				   size, PAGE_SIZE << order, gfp, node);
>  	}
>
>  	kmemleak_alloc(ret, size, 1, gfp);
>  	return ret;
>  }
> +
> +void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> +{
> +	return __do_kmalloc_node(size, gfp, node, _RET_IP_);
> +}
>  EXPORT_SYMBOL(__kmalloc_node);
>
> +#ifdef CONFIG_TRACING
> +void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
> +{
> +	return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
> +}
> +
> +#ifdef CONFIG_NUMA
> +void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> +					int node, unsigned long caller)
> +{
> +	return __do_kmalloc_node(size, gfp, node, caller);
> +}
> +#endif

This breaks Pekka's slab/next tree with this:

mm/slob.c: In function '__kmalloc_node_track_caller':
mm/slob.c:488: error: 'gfp' undeclared (first use in this function)
mm/slob.c:488: error: (Each undeclared identifier is reported only once
mm/slob.c:488: error: for each function it appears in.)

mm, slob: fix build breakage in __kmalloc_node_track_caller

"mm, slob: Add support for kmalloc_track_caller()" breaks the build
because gfp is undeclared.  Fix it.

Acked-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-26 09:33:49 +03:00
Ezequiel Garcia f3f7410195 mm, slob: Add support for kmalloc_track_caller()
Currently slob falls back to regular kmalloc for this case.
With this patch kmalloc_track_caller() is correctly implemented,
thus tracing the specified caller.

This is important to trace accurately allocations performed by
krealloc, kstrdup, kmemdup, etc.

Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-25 10:14:18 +03:00
Ezequiel Garcia 90f2cbbc49 mm, slob: Use NUMA_NO_NODE instead of -1
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-25 10:11:14 +03:00
Christoph Lameter cce89f4f69 mm/sl[aou]b: Move kmem_cache refcounting to common code
Get rid of the refcount stuff in the allocators and do that part of
kmem_cache management in the common code.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:37 +03:00
Christoph Lameter 8a13a4cc80 mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists
Do the initial settings of the fields in common code. This will allow us
to push more processing into common code later and improve readability.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:37 +03:00
Christoph Lameter 278b1bb131 mm/sl[aou]b: Move kmem_cache allocations into common code
Shift the allocations to common code. That way the allocation and
freeing of the kmem_cache structures is handled by common code.

Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:36 +03:00
Christoph Lameter 12c3667fb7 mm/sl[aou]b: Get rid of __kmem_cache_destroy
What is done there can be done in __kmem_cache_shutdown.

This affects RCU handling somewhat. On rcu free all slab allocators do
not refer to other management structures than the kmem_cache structure.
Therefore these other structures can be freed before the rcu deferred
free to the page allocator occurs.

Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:36 +03:00
Christoph Lameter 8f4c765c22 mm/sl[aou]b: Move freeing of kmem_cache structure to common code
The freeing action is basically the same in all slab allocators.
Move to the common kmem_cache_destroy() function.

Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:36 +03:00
Christoph Lameter 9b030cb865 mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache struct
Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
structure.

Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05 12:00:36 +03:00