Commit Graph

562 Commits

Author SHA1 Message Date
Chen Gang 68f06650ea mm/slub.c: beautify code for removing redundancy 'break' statement.
Remove redundancy 'break' statement.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-08-13 09:21:46 +03:00
Libin ac6434e6b8 slub: Remove unnecessary page NULL check
In commit 4d7868e6(slub: Do not dereference NULL pointer in node_match)
had added check for page NULL in node_match.  Thus, it is not needed
to check it before node_match, remove it.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-08-13 09:21:46 +03:00
Chen Gang d0e0ac9772 mm/slub: beautify code for 80 column limitation and tab alignment
Be sure of 80 column limitation for both code and comments.

Correct tab alignment for 'if-else' statement.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-17 10:11:57 +03:00
Chen Gang e35e1a9744 mm/slub: remove 'per_cpu' which is useless variable
Remove 'per_cpu', since it is useless now after the patch: "205ab99
slub: Update statistics handling for variable order slabs". And the
partial list is handled in the same way as the per cpu slab.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-15 12:37:55 +03:00
Linus Torvalds 54be820019 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab update from Pekka Enberg:
 "Highlights:

  - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

  - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

  - Fix for excessive slab freelist draining (Wanpeng Li)

  - SLUB and SLOB cleanups and fixes (various people)"

I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
  slub: Check for page NULL before doing the node_match check
  mm/slab: Give s_next and s_stop slab-specific names
  slob: Check for NULL pointer before calling ctor()
  slub: Make cpu partial slab support configurable
  slab: add kmalloc() to kernel API documentation
  slab: fix init_lock_keys
  slob: use DIV_ROUND_UP where possible
  slub: do not put a slab to cpu partial list when cpu_partial is 0
  mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
  mm/slub: Drop unnecessary nr_partials
  mm/slab: Fix /proc/slabinfo unwriteable for slab
  mm/slab: Sharing s_next and s_stop between slab and slub
  mm/slab: Fix drain freelist excessively
  slob: Rework #ifdeffery in slab.h
  mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-14 15:14:29 -07:00
Steven Rostedt c25f195e82 slub: Check for page NULL before doing the node_match check
In the -rt kernel (mrg), we hit the following dump:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
PGD a2d39067 PUD b1641067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU 3
Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
Stack:
 ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
 0000000001200011 0000000001200011 0000000000000000 0000000000000000
 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
Call Trace:
 [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
 [<ffffffff8104369a>] do_fork+0x5a/0x360
 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
 [<ffffffff8100b068>] sys_clone+0x28/0x30
 [<ffffffff81527423>] stub_clone+0x13/0x20
 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
 RSP <ffff8800a9b17d70>
CR2: 0000000000000000
---[ end trace 0000000000000002 ]---

Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
disable migration. But the SLUB code is relatively lockless, and the
spin_locks there are raw_spin_locks (not converted to mutexes), thus I
believe this bug can happen in mainline without -rt features. The -rt
patch is just good at triggering mainline bugs ;-)

Anyway, looking at where this crashed, it seems that the page variable
can be NULL when passed to the node_match() function (which does not
check if it is NULL). When this happens we get the above panic.

As page is only used in slab_alloc() to check if the node matches, if
it's NULL I'm assuming that we can say it doesn't and call the
__slab_alloc() code. Is this a correct assumption?

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-14 15:13:16 -07:00
Joonsoo Kim 345c905d13 slub: Make cpu partial slab support configurable
CPU partial support can introduce level of indeterminism that is not
wanted in certain context (like a realtime kernel). Make it
configurable.

This patch is based on Christoph Lameter's "slub: Make cpu partial slab
support configurable V2".

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 19:09:56 +03:00
Joonsoo Kim 318df36e57 slub: do not put a slab to cpu partial list when cpu_partial is 0
In free path, we don't check number of cpu_partial, so one slab can
be linked in cpu partial list even if cpu_partial is 0. To prevent this,
we should check number of cpu_partial in put_cpu_partial().

Acked-by: Christoph Lameeter <cl@linux.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 18:46:30 +03:00
Wanpeng Li c17fd13ec0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
Use existing interface node_nr_slabs and node_nr_objs to get
nr_slabs and nr_objs.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 18:37:48 +03:00
Wanpeng Li a446336454 mm/slub: Drop unnecessary nr_partials
This patch remove unused nr_partials variable.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07 18:37:48 +03:00
Linus Torvalds 0f47c9423c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
 "The bulk of the changes are more slab unification from Christoph.

  There's also few fixes from Aaron, Glauber, and Joonsoo thrown into
  the mix."

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits)
  mm, slab_common: Fix bootstrap creation of kmalloc caches
  slab: Return NULL for oversized allocations
  mm: slab: Verify the nodeid passed to ____cache_alloc_node
  slub: tid must be retrieved from the percpu area of the current processor
  slub: Do not dereference NULL pointer in node_match
  slub: add 'likely' macro to inc_slabs_node()
  slub: correct to calculate num of acquired objects in get_partial_node()
  slub: correctly bootstrap boot caches
  mm/sl[au]b: correct allocation type check in kmalloc_slab()
  slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections
  slab: Handle ARCH_DMA_MINALIGN correctly
  slab: Common definition for kmem_cache_node
  slab: Rename list3/l3 to node
  slab: Common Kmalloc cache determination
  stat: Use size_t for sizes instead of unsigned
  slab: Common function to create the kmalloc array
  slab: Common definition for the array of kmalloc caches
  slab: Common constants for kmalloc boundaries
  slab: Rename nodelists to node
  slab: Common name for the per node structures
  ...
2013-05-07 08:42:20 -07:00
Pekka Enberg 69df2ac128 Merge branch 'slab/next' into slab/for-linus 2013-05-07 09:19:47 +03:00
Andrew Morton 3ac38faa1f mm/slub.c: use register_hotmemory_notifier()
Squishes a statement-with-no-effect warning, removes some ifdefs and
shrinks .text by 2 bytes.

Note that this code fails to check for blocking_notifier_chain_register()
failures.

Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00
Christoph Lameter 7cccd80b43 slub: tid must be retrieved from the percpu area of the current processor
As Steven Rostedt has pointer out: rescheduling could occur on a
different processor after the determination of the per cpu pointer and
before the tid is retrieved. This could result in allocation from the
wrong node in slab_alloc().

The effect is much more severe in slab_free() where we could free to the
freelist of the wrong page.

The window for something like that occurring is pretty small but it is
possible.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-04-05 14:23:06 +03:00
Christoph Lameter 4d7868e647 slub: Do not dereference NULL pointer in node_match
The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in __slab_alloc.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-04-05 14:23:05 +03:00
Joonsoo Kim 338b264229 slub: add 'likely' macro to inc_slabs_node()
After boot phase, 'n' always exist.
So add 'likely' macro for helping compiler.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-04-02 09:42:17 +03:00
Joonsoo Kim 633b076464 slub: correct to calculate num of acquired objects in get_partial_node()
There is a subtle bug when calculating a number of acquired objects.

Currently, we calculate "available = page->objects - page->inuse",
after acquire_slab() is called in get_partial_node().

In acquire_slab() with mode = 1, we always set new.inuse = page->objects.
So,

	acquire_slab(s, n, page, object == NULL);

	if (!object) {
		c->page = page;
		stat(s, ALLOC_FROM_PARTIAL);
		object = t;
		available = page->objects - page->inuse;

		!!! availabe is always 0 !!!
	...

Therfore, "available > s->cpu_partial / 2" is always false and
we always go to second iteration.
This patch correct this problem.

After that, we don't need return value of put_cpu_partial().
So remove it.

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-04-02 09:42:10 +03:00
Glauber Costa 7d557b3cb6 slub: correctly bootstrap boot caches
After we create a boot cache, we may allocate from it until it is bootstraped.
This will move the page from the partial list to the cpu slab list. If this
happens, the loop:

	list_for_each_entry(p, &n->partial, lru)

that we use to scan for all partial pages will yield nothing, and the pages
will keep pointing to the boot cpu cache, which is of course, invalid. To do
that, we should flush the cache to make sure that the cpu slab is back to the
partial list.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by:  Steffen Michalke <StMichalke@web.de>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-28 09:29:38 +02:00
Linus Torvalds 9043a2650c Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
 "The sweeping change is to make add_taint() explicitly indicate whether
  to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MODSIGN: Add option to not sign modules during modules_install
  MODSIGN: Add -s <signature> option to sign-file
  MODSIGN: Specify the hash algorithm on sign-file command line
  MODSIGN: Simplify Makefile with a Kconfig helper
  module: clean up load_module a little more.
  modpost: Ignore ARC specific non-alloc sections
  module: constify within_module_*
  taint: add explicit flag to show whether lock dep is still OK.
  module: printk message when module signature fail taints kernel.
2013-02-25 15:41:43 -08:00
Mel Gorman 22b751c3d0 mm: rename page struct field helpers
The function names page_xchg_last_nid(), page_last_nid() and
reset_page_last_nid() were judged to be inconsistent so rename them to a
struct_field_op style pattern.  As it looked jarring to have
reset_page_mapcount() and page_nid_reset_last() beside each other in
memmap_init_zone(), this patch also renames reset_page_mapcount() to
page_mapcount_reset().  There are others like init_page_count() but as
it is used throughout the arch code a rename would likely cause more
conflicts than it is worth.

[akpm@linux-foundation.org: fix zcache]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:18 -08:00
Christoph Lameter 2c59dd6544 slab: Common Kmalloc cache determination
Extract the optimized lookup functions from slub and put them into
slab_common.c. Then make slab use these functions as well.

Joonsoo notes that this fixes some issues with constant folding which
also reduces the code size for slub.

https://lkml.org/lkml/2012/10/20/82

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01 12:32:08 +02:00
Christoph Lameter f97d5f634d slab: Common function to create the kmalloc array
The kmalloc array is created in similar ways in both SLAB
and SLUB. Create a common function and have both allocators
call that function.

V1->V2:
	Whitespace cleanup

Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01 12:32:08 +02:00
Christoph Lameter 9425c58e54 slab: Common definition for the array of kmalloc caches
Have a common definition fo the kmalloc cache arrays in
SLAB and SLUB

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01 12:32:07 +02:00
Christoph Lameter 95a05b428c slab: Common constants for kmalloc boundaries
Standardize the constants that describe the smallest and largest
object kept in the kmalloc arrays for SLAB and SLUB.

Differentiate between the maximum size for which a slab cache is used
(KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size
(KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER).

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01 12:32:07 +02:00
Rusty Russell 373d4d0997 taint: add explicit flag to show whether lock dep is still OK.
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-21 17:17:57 +10:30