linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Christoph Lameter	66c4c35c6b	slub: Do not hold slub_lock when calling sysfs_slab_add() sysfs_slab_add() calls various sysfs functions that actually may end up in userspace doing all sorts of things. Release the slub_lock after adding the kmem_cache structure to the list. At that point the address of the kmem_cache is not known so we are guaranteed exlusive access to the following modifications to the kmem_cache structure. If the sysfs_slab_add fails then reacquire the slub_lock to remove the kmem_cache structure from the list. Cc: <stable@vger.kernel.org> # 3.3+ Reported-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-02-06 12:24:13 +02:00
Eric Dumazet	0ad9500e16	slub: prefetch next freelist pointer in slab_alloc() Recycling a page is a problem, since freelist link chain is hot on cpu(s) which freed objects, and possibly very cold on cpu currently owning slab. Adding a prefetch of cache line containing the pointer to next object in slab_alloc() helps a lot in many workloads, in particular on assymetric ones (allocations done on one cpu, frees on another cpus). Added cost is three machine instructions only. Examples on my dual socket quad core ht machine (Intel CPU E5540 @2.53GHz) (16 logical cpus, 2 memory nodes), 64bit kernel. Before patch : # perf stat -r 32 hackbench 50 process 4000 >/dev/null Performance counter stats for 'hackbench 50 process 4000' (32 runs): 327577,471718 task-clock # 15,821 CPUs utilized ( +- 0,64% ) 28 866 491 context-switches # 0,088 M/sec ( +- 1,80% ) 1 506 929 CPU-migrations # 0,005 M/sec ( +- 3,24% ) 127 151 page-faults # 0,000 M/sec ( +- 0,16% ) 829 399 813 448 cycles # 2,532 GHz ( +- 0,64% ) 580 664 691 740 stalled-cycles-frontend # 70,01% frontend cycles idle ( +- 0,71% ) 197 431 700 448 stalled-cycles-backend # 23,80% backend cycles idle ( +- 1,03% ) 503 548 648 975 instructions # 0,61 insns per cycle # 1,15 stalled cycles per insn ( +- 0,46% ) 95 780 068 471 branches # 292,389 M/sec ( +- 0,48% ) 1 426 407 916 branch-misses # 1,49% of all branches ( +- 1,35% ) 20,705679994 seconds time elapsed ( +- 0,64% ) After patch : # perf stat -r 32 hackbench 50 process 4000 >/dev/null Performance counter stats for 'hackbench 50 process 4000' (32 runs): 286236,542804 task-clock # 15,786 CPUs utilized ( +- 1,32% ) 19 703 372 context-switches # 0,069 M/sec ( +- 4,99% ) 1 658 249 CPU-migrations # 0,006 M/sec ( +- 6,62% ) 126 776 page-faults # 0,000 M/sec ( +- 0,12% ) 724 636 593 213 cycles # 2,532 GHz ( +- 1,32% ) 499 320 714 837 stalled-cycles-frontend # 68,91% frontend cycles idle ( +- 1,47% ) 156 555 126 809 stalled-cycles-backend # 21,60% backend cycles idle ( +- 2,22% ) 463 897 792 661 instructions # 0,64 insns per cycle # 1,08 stalled cycles per insn ( +- 0,94% ) 87 717 352 563 branches # 306,451 M/sec ( +- 0,99% ) 941 738 280 branch-misses # 1,07% of all branches ( +- 3,35% ) 18,132070670 seconds time elapsed ( +- 1,30% ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> CC: Matt Mackall <mpm@selenic.com> CC: David Rientjes <rientjes@google.com> CC: "Alex,Shi" <alex.shi@intel.com> CC: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-01-24 21:53:57 +02:00
Heiko Carstens	2565409fc0	mm,x86,um: move CMPXCHG_DOUBLE config option Move CMPXCHG_DOUBLE and rename it to HAVE_CMPXCHG_DOUBLE so architectures can simply select the option if it is supported. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Heiko Carstens	43570fd2f4	mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL While implementing cmpxchg_double() on s390 I realized that we don't set CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it. However setting that option will increase the size of struct page by eight bytes on 64 bit, which we certainly do not want. Also, it doesn't make sense that a present cpu feature should increase the size of struct page. Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and that it should depend on CMPXCHG_DOUBLE instead. This patch: If an architecture supports CMPXCHG_LOCAL this shouldn't result automatically in larger struct pages if the SLUB allocator is used. Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which can be selected if a double word aligned struct page is required. Also update x86 Kconfig so that it should work as before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Linus Torvalds	6296e5d3c0	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: slub: disallow changing cpu_partial from userspace for debug caches slub: add missed accounting slub: Extract get_freelist from __slab_alloc slub: Switch per cpu partial page support off for debugging slub: fix a possible memleak in __slab_alloc() slub: fix slub_max_order Documentation slub: add missed accounting slab: add taint flag outputting to debug paths. slub: add taint flag outputting to debug paths slab: introduce slab_max_order kernel parameter slab: rename slab_break_gfp_order to slab_max_order	2012-01-11 18:52:23 -08:00
Pekka Enberg	5878cf431c	Merge branch 'slab/urgent' into slab/for-linus	2012-01-11 21:11:29 +02:00
Stanislaw Gruszka	fc8d8620d3	slub: min order when debug_guardpage_minorder > 0 Disable slub debug facilities and allocate slabs at minimal order when debug_guardpage_minorder > 0 to increase probability to catch random memory corruption by cpu exception. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:43 -08:00
David Rientjes	74ee4ef1f9	slub: disallow changing cpu_partial from userspace for debug caches For caches with debugging enabled, "slub: Switch per cpu partial page support off for debugging" changes cpu_partial to 0. It shouldn't be tunable from userspace for such caches, otherwise the same accounting issues arise during validation. This patch disallows tuning /sys/kernel/slab/cache/cpu_partial to be non- zero for caches with debugging enabled. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-01-10 21:31:09 +02:00
Linus Torvalds	6b3da11b3c	Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Remove irqsafe_cpu_xxx variants Fix up conflict in arch/x86/include/asm/percpu.h due to clash with `cebef5beed` ("x86: Fix and improve percpu_cmpxchg{8,16}b_double()") which edited the (now removed) irqsafe_cpu_cmpxchg*_double code.	2012-01-09 13:08:28 -08:00
Jan Beulich	cdcd629869	x86: Fix and improve cmpxchg_double{,_local}() Just like the per-CPU ones they had several problems/shortcomings: Only the first memory operand was mentioned in the asm() operands, and the 2x64-bit version didn't have a memory clobber while the 2x32-bit one did. The former allowed the compiler to not recognize the need to re-load the data in case it had it cached in some register, while the latter was overly destructive. The types of the local copies of the old and new values were incorrect (the types of the pointed-to variables should be used here, to make sure the respective old/new variable types are compatible). The __dummy/__junk variables were pointless, given that local copies of the inputs already existed (and can hence be used for discarded outputs). The 32-bit variant of cmpxchg_double_local() referenced cmpxchg16b_local(). At once also: - change the return value type to what it really is: 'bool' - unify 32- and 64-bit variants - abstract out the common part of the 'normal' and 'local' variants Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F01F12A020000780006A19B@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-04 15:01:54 +01:00
Christoph Lameter	933393f58f	percpu: Remove irqsafe_cpu_xxx variants We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. -tj: This is part of on-going percpu API cleanup. For detailed discussion of the subject, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1222078 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>	2011-12-22 10:40:20 -08:00
Shaohua Li	b13683d1cc	slub: add missed accounting With per-cpu partial list, slab is added to partial list first and then moved to node list. The __slab_free() code path for add/remove_partial is almost deprecated(except for slub debug). But we forget to account add/remove_partial when move per-cpu partial pages to node list, so the statistics for such events are always 0. Add corresponding accounting. This is against the patch "slub: use correct parameter to add a page to partial list tail" Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-12-13 22:27:09 +02:00
Christoph Lameter	213eeb9fd9	slub: Extract get_freelist from __slab_alloc get_freelist retrieves free objects from the page freelist (put there by remote frees) or deactivates a slab page if no more objects are available. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-12-13 22:17:10 +02:00
Christoph Lameter	8f1e33daed	slub: Switch per cpu partial page support off for debugging Eric saw an issue with accounting of slabs during validation. Its not possible to determine accurately how many per cpu partial slabs exist at any time so this switches off per cpu partial pages during debug. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-12-13 22:14:02 +02:00
Eric Dumazet	73736e0387	slub: fix a possible memleak in __slab_alloc() Zhihua Che reported a possible memleak in slub allocator on CONFIG_PREEMPT=y builds. It is possible current thread migrates right before disabling irqs in __slab_alloc(). We must check again c->freelist, and perform a normal allocation instead of scratching c->freelist. Many thanks to Zhihua Che for spotting this bug, introduced in 2.6.39 V2: Its also possible an IRQ freed one (or several) object(s) and populated c->freelist, so its not a CONFIG_PREEMPT only problem. Cc: <stable@vger.kernel.org> [2.6.39+] Reported-by: Zhihua Che <zhihua.che@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-12-13 22:11:21 +02:00
Shaohua Li	4c493a5a5c	slub: add missed accounting With per-cpu partial list, slab is added to partial list first and then moved to node list. The __slab_free() code path for add/remove_partial is almost deprecated(except for slub debug). But we forget to account add/remove_partial when move per-cpu partial pages to node list, so the statistics for such events are always 0. Add corresponding accounting. This is against the patch "slub: use correct parameter to add a page to partial list tail" Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-27 22:08:15 +02:00
Pekka Enberg	42616cacf8	Merge branch 'slab/urgent' into slab/next	2011-11-27 22:08:03 +02:00
Eric Dumazet	bc6697d8a5	slub: avoid potential NULL dereference or corruption show_slab_objects() can trigger NULL dereferences or memory corruption. Another cpu can change its c->page to NULL or c->node to NUMA_NO_NODE while we use them. Use ACCESS_ONCE(c->page) and ACCESS_ONCE(c->node) to make sure this cannot happen. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-24 08:44:19 +02:00
Christoph Lameter	42d623a8cd	slub: use irqsafe_cpu_cmpxchg for put_cpu_partial The cmpxchg must be irq safe. The fallback for this_cpu_cmpxchg only disables preemption which results in per cpu partial page operation potentially failing on non x86 platforms. This patch fixes the following problem reported by Christian Kujau: I seem to hit it with heavy disk & cpu IO is in progress on this PowerBook G4. Full dmesg & .config: http://nerdbynature.de/bits/3.2.0-rc1/oops/ I've enabled some debug options and now it really points to slub.c:2166 http://nerdbynature.de/bits/3.2.0-rc1/oops/oops4m.jpg With debug options enabled I'm currently in the xmon debugger, not sure what to make of it yet, I'll try to get something useful out of it :) Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-24 08:44:14 +02:00
Dave Jones	265d47e711	slub: add taint flag outputting to debug paths When we get corruption reports, it's useful to see if the kernel was tainted, to rule out problems we can't do anything about. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-16 21:14:40 +02:00
Shaohua Li	9ada19342b	slub: move discard_slab out of node lock Lockdep reports there is potential deadlock for slub node list_lock. discard_slab() is called with the lock hold in unfreeze_partials(), which could trigger a slab allocation, which could hold the lock again. discard_slab() doesn't need hold the lock actually, if the slab is already removed from partial list. Acked-by: Christoph Lameter <cl@linux.com> Reported-and-tested-by: Yong Zhang <yong.zhang0@gmail.com> Reported-and-tested-by: Julie Sullivan <kernelmail.jms@gmail.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-15 20:41:00 +02:00
Shaohua Li	f64ae042d9	slub: use correct parameter to add a page to partial list tail unfreeze_partials() needs add the page to partial list tail, since such page hasn't too many free objects. We now explictly use DEACTIVATE_TO_TAIL for this, while DEACTIVATE_TO_TAIL != 1. This will cause performance regression (eg, more lock contention in node->list_lock) without below fix. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-11-15 20:37:15 +02:00
Akinobu Mita	798248206b	lib/string.c: introduce memchr_inv() memchr_inv() is mainly used to check whether the whole buffer is filled with just a specified byte. The function name and prototype are stolen from logfs and the implementation is from SLUB. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Acked-by: Joern Engel <joern@logfs.org> Cc: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-31 17:30:47 -07:00
Pekka Enberg	e182a345d4	Merge branches 'slab/next' and 'slub/partial' into slab/for-linus	2011-10-26 18:09:12 +03:00
Alex Shi	dcc3be6a54	slub: Discard slab page when node partial > minimum partial number Discarding slab should be done when node partial > min_partial. Otherwise, node partial slab may eat up all memory. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2011-09-27 23:03:31 +03:00

1 2 3 4 5 ...

507 Commits