This patch is add white list into modpost.c for some functions and
ia64's section to fix section mismatchs.
sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
at boot time, and kmalloc/vmalloc at hotplug time. If config
memory hotplug is on, there are references of bootmem allocater(init text)
from them (normal text). This is cause of section mismatch.
Bootmem is called by many functions and it must be
used only at boot time. I think __init of them should keep for
section mismatch check. So, I would like to register sparse_index_alloc()
and zone_wait_table_init() into white list.
In addition, ia64's .machvec section is function table of some platform
dependent code. It is mixture of .init.text and normal text. These
reference of __init functions are valid too.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is to fix many section mismatches of code related to memory hotplug.
I checked compile with memory hotplug on/off on ia64 and x86-64 box.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two problems with the existing redzone implementation.
Firstly, it's causing misalignment of structures which contain a 64-bit
integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
modules to fail to load because of the misalignment. (In particular, the
first check in
net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())
On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.
With slab debugging, we use 32-bit redzones. And allocated slab objects
aren't sufficiently aligned to hold a structure containing a uint64_t.
By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
redzone checks on those architectures. By using 64-bit redzones we avoid that
loss of debugging, and also fix the other problem while we're at it.
When investigating this, I noticed that on 64-bit platforms we're using a
32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
aside for the redzone. Which means that the four bytes immediately before
or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
machines, respectively. Which is probably not the most useful choice of
poison value.
One way to fix both of those at once is just to switch to 64-bit
redzones in all cases.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The newly merged SLUB allocator patches had been generated before the
removal of "struct subsystem", and ended up applying fine, but wouldn't
build based on the current tree as a result.
Fix up that merge error - not that SLUB is likely really ready for
showtime yet, but at least I can fix the trivial stuff.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SERIAL] sunsu: Fix section mismatch warnings.
[SPARC64]: pgtable_cache_init() should be __init.
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/prom.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/pci.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/console.c
[MM]: sparse_init() should be __init.
[SPARC64]: Update defconfig.
[VIDEO]: Add Sun XVR-2500 framebuffer driver.
[VIDEO]: Add Sun XVR-500 framebuffer driver.
[SPARC64]: SUN4U PCI-E controller support.
[SPARC]: Fix comment typo in smp4m_blackbox_current().
[SCSI] SUNESP: sun_esp.c needs linux/delay.h
Fix up conflict in arch/sparc64/mm/init.c manually due to removal of
pgtable_cache_init() through the -mm patches (even though that patch was
also by David ;)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it
happens between try_to_freeze() and prepare_to_wait(). To prevent this
from happening we should check freezing(current) before calling schedule().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with
calls to inline functions that can be changed in subsequent patches without
modifying the code calling them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
that one would want to do something serious in a constructor or destructor.
In particular given that the slab allocators run with interrupts disabled.
Actions in constructors and destructors are by their nature very limited
and usually do not go beyond initializing variables and list operations.
(The i386 pgd ctor and dtors do take a spinlock in constructor and
destructor..... I think that is the furthest we go at this point.)
There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
establishes a certain symmetry.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.
Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.
constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible. If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
This results in deadlock.
We now take callback_mutex after iterating through the zonelist since we
don't need it yet.
Cc: Andi Kleen <ak@suse.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin J. Bligh <mbligh@mbligh.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.
This is tested on my ia64 box which has 3 nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ethan Solomita <solo@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently failslab injects failures into ____cache_alloc(). But with enabling
CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
kmem_cache_alloc, ...) return NULL.
This patch moves fault injection hook inside of __cache_alloc() and
__cache_alloc_node(). These are lower call path than ____cache_alloc() and
enable to inject faulures to slab allocators with CONFIG_NUMA.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch was recently posted to lkml and acked by Pekka.
The flag SLAB_MUST_HWCACHE_ALIGN is
1. Never checked by SLAB at all.
2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB
3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.
The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.
The flag is confusing, inconsistent and has no purpose.
Remove it.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On x86_64 this cuts allocation overhead for page table pages down to a
fraction (kernel compile / editing load. TSC based measurement of times spend
in each function):
no quicklist
pte_alloc 1569048 4.3s(401ns/2.7us/179.7us)
pmd_alloc 780988 2.1s(337ns/2.7us/86.1us)
pud_alloc 780072 2.2s(424ns/2.8us/300.6us)
pgd_alloc 260022 1s(920ns/4us/263.1us)
quicklist:
pte_alloc 452436 573.4ms(8ns/1.3us/121.1us)
pmd_alloc 196204 174.5ms(7ns/889ns/46.1us)
pud_alloc 195688 172.4ms(7ns/881ns/151.3us)
pgd_alloc 65228 9.8ms(8ns/150ns/6.1us)
pgd allocations are the most complex and there we see the most dramatic
improvement (may be we can cut down the amount of pgds cached somewhat?). But
even the pte allocations still see a doubling of performance.
1. Proven code from the IA64 arch.
The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.
2. Light weight alternative to use slab to manage page size pages
Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.
3. x86_64 gets lightweight page table page management.
This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.
64 Consolidation of code from multiple arches
So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quicklists.
Page table pages have the characteristics that they are typically zero or in a
known state when they are freed. This is usually the exactly same state as
needed after allocation. So it makes sense to build a list of freed page
table pages and then consume the pages already in use first. Those pages have
already been initialized correctly (thus no need to zero them) and are likely
already cached in such a way that the MMU can use them most effectively. Page
table pages are used in a sparse way so zeroing them on allocation is not too
useful.
Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64. It
also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.
Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one
quicklist is necessary then we can define NR_QUICK for additional lists. F.e.
i386 needs two and thus has
config NR_QUICK
int
default 2
If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:
quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)
Page table pages can be freed using:
quicklist_free(<quicklist-nr>, <destructor>, <page>)
Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.
If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.
Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure that the check function really only check things and do not perform
activities. Extract the tracing and object seeding out of the two check
functions and place them into slab_alloc and slab_free
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At kmem_cache_shrink check if we have any empty slabs on the partial
if so then remove them.
Also--as an anti-fragmentation measure--sort the partial slabs so that
the most fully allocated ones come first and the least allocated last.
The next allocations may fill up the nearly full slabs. Having the
least allocated slabs last gives them the maximum chance that their
remaining objects may be freed. Thus we can hopefully minimize the
partial slabs.
I think this is the best one can do in terms antifragmentation
measures. Real defragmentation (meaning moving objects out of slabs with
the least free objects to those that are almost full) can be implemted
by reverse scanning through the list produced here but that would mean
that we need to provide a callback at slab cache creation that allows
the deletion or moving of an object. This will involve slab API
changes, so defer for now.
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>