Commit Graph

54218 Commits

Author SHA1 Message Date
Nick Piggin 0a27a14a62 mm: madvise avoid exclusive mmap_sem
Avoid down_write of the mmap_sem in madvise when we can help it.

Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
matze b4169525bc include KERN_* constant in printk() calls in mm/slab.c
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Akinobu Mita bc0055aee4 slob: handle SLAB_PANIC flag
kmem_cache_create() for slob doesn't handle SLAB_PANIC.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
David Miller 3a2cba993b Quicklist support for sparc64
I ported this to sparc64 as per the patch below, tested on UP SunBlade1500 and
24 cpu Niagara T1000.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 6225e93735 Quicklists for page table pages
On x86_64 this cuts allocation overhead for page table pages down to a
fraction (kernel compile / editing load.  TSC based measurement of times spend
in each function):

no quicklist

pte_alloc               1569048 4.3s(401ns/2.7us/179.7us)
pmd_alloc                780988 2.1s(337ns/2.7us/86.1us)
pud_alloc                780072 2.2s(424ns/2.8us/300.6us)
pgd_alloc                260022 1s(920ns/4us/263.1us)

quicklist:

pte_alloc                452436 573.4ms(8ns/1.3us/121.1us)
pmd_alloc                196204 174.5ms(7ns/889ns/46.1us)
pud_alloc                195688 172.4ms(7ns/881ns/151.3us)
pgd_alloc                 65228 9.8ms(8ns/150ns/6.1us)

pgd allocations are the most complex and there we see the most dramatic
improvement (may be we can cut down the amount of pgds cached somewhat?).  But
even the pte allocations still see a doubling of performance.

1. Proven code from the IA64 arch.

	The method used here has been fine tuned for years and
	is NUMA aware. It is based on the knowledge that accesses
	to page table pages are sparse in nature. Taking a page
	off the freelists instead of allocating a zeroed pages
	allows a reduction of number of cachelines touched
	in addition to getting rid of the slab overhead. So
	performance improves. This is particularly useful if pgds
	contain standard mappings. We can save on the teardown
	and setup of such a page if we have some on the quicklists.
	This includes avoiding lists operations that are otherwise
	necessary on alloc and free to track pgds.

2. Light weight alternative to use slab to manage page size pages

	Slab overhead is significant and even page allocator use
	is pretty heavy weight. The use of a per cpu quicklist
	means that we touch only two cachelines for an allocation.
	There is no need to access the page_struct (unless arch code
	needs to fiddle around with it). So the fast past just
	means bringing in one cacheline at the beginning of the
	page. That same cacheline may then be used to store the
	page table entry. Or a second cacheline may be used
	if the page table entry is not in the first cacheline of
	the page. The current code will zero the page which means
	touching 32 cachelines (assuming 128 byte). We get down
	from 32 to 2 cachelines in the fast path.

3. x86_64 gets lightweight page table page management.

	This will allow x86_64 arch code to faster repopulate pgds
	and other page table entries. The list operations for pgds
	are reduced in the same way as for i386 to the point where
	a pgd is allocated from the page allocator and when it is
	freed back to the page allocator. A pgd can pass through
	the quicklists without having to be reinitialized.

64 Consolidation of code from multiple arches

	So far arches have their own implementation of quicklist
	management. This patch moves that feature into the core allowing
	an easier maintenance and consistent management of quicklists.

Page table pages have the characteristics that they are typically zero or in a
known state when they are freed.  This is usually the exactly same state as
needed after allocation.  So it makes sense to build a list of freed page
table pages and then consume the pages already in use first.  Those pages have
already been initialized correctly (thus no need to zero them) and are likely
already cached in such a way that the MMU can use them most effectively.  Page
table pages are used in a sparse way so zeroing them on allocation is not too
useful.

Such an implementation already exits for ia64.  Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.  It
also only supported a single quicklist.  The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.

Quicklists are defined by an arch defining CONFIG_QUICKLIST.  If more than one
quicklist is necessary then we can define NR_QUICK for additional lists.  F.e.
 i386 needs two and thus has

config NR_QUICK
	int
	default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:

quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)

Page table pages can be freed using:

quicklist_free(<quicklist-nr>, <destructor>, <page>)

Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter c09d875172 slub: add slabinfo tool
Add the tool which gets reports about slabs to the VM documentation directory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 352434211d slub: user documentation
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 70d71228af slub: remove object activities out of checking functions
Make sure that the check function really only check things and do not perform
activities.  Extract the tracing and object seeding out of the two check
functions and place them into slab_alloc and slab_free

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 2086d26a05 SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink
At kmem_cache_shrink check if we have any empty slabs on the partial
if so then remove them.

Also--as an anti-fragmentation measure--sort the partial slabs so that
the most fully allocated ones come first and the least allocated last.

The next allocations may fill up the nearly full slabs. Having the
least allocated slabs last gives them the maximum chance that their
remaining objects may be freed. Thus we can hopefully minimize the
partial slabs.

I think this is the best one can do in terms antifragmentation
measures. Real defragmentation (meaning moving objects out of slabs with
the least free objects to those that are almost full) can be implemted
by reverse scanning through the list produced here but that would mean
that we need to provide a callback at slab cache creation that allows
the deletion or moving of an object. This will involve slab API
changes, so defer for now.

Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 88a420e4e2 slub: add ability to list alloc / free callers per slab
This patch enables listing the callers who allocated or freed objects in a
cache.

For example to list the allocators for kmalloc-128 do

cat /sys/slab/kmalloc-128/alloc_calls
      7 sn_io_slot_fixup+0x40/0x700
      7 sn_io_slot_fixup+0x80/0x700
      9 sn_bus_fixup+0xe0/0x380
      6 param_sysfs_setup+0xf0/0x280
    276 percpu_populate+0xf0/0x1a0
     19 __register_chrdev_region+0x30/0x360
      8 expand_files+0x2e0/0x6e0
      1 sys_epoll_create+0x60/0x200
      1 __mounts_open+0x140/0x2c0
     65 kmem_alloc+0x110/0x280
      3 alloc_disk_node+0xe0/0x200
     33 as_get_io_context+0x90/0x280
     74 kobject_kset_add_dir+0x40/0x140
     12 pci_create_bus+0x2a0/0x5c0
      1 acpi_ev_create_gpe_block+0x120/0x9e0
     41 con_insert_unipair+0x100/0x1c0
      1 uart_open+0x1c0/0xba0
      1 dma_pool_create+0xe0/0x340
      2 neigh_table_init_no_netlink+0x260/0x4c0
      6 neigh_parms_alloc+0x30/0x200
      1 netlink_kernel_create+0x130/0x320
      5 fz_hash_alloc+0x50/0xe0
      2 sn_common_hubdev_init+0xd0/0x6e0
     28 kernel_param_sysfs_setup+0x30/0x180
     72 process_zones+0x70/0x2e0

cat /sys/slab/kmalloc-128/free_calls
    558 <not-available>
      3 sn_io_slot_fixup+0x600/0x700
     84 free_fdtable_rcu+0x120/0x260
      2 seq_release+0x40/0x60
      6 kmem_free+0x70/0xc0
     24 free_as_io_context+0x20/0x200
      1 acpi_get_object_info+0x3a0/0x3e0
      1 acpi_add_single_object+0xcf0/0x1e40
      2 con_release_unimap+0x80/0x140
      1 free+0x20/0x40

SLAB_STORE_USER must be enabled for a slab cache by either booting with
"slab_debug" or enabling user tracking specifically for the slab of interest.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter e95eed571e SLUB: Add MIN_PARTIAL
We leave a mininum of partial slabs on nodes when we search for
partial slabs on other node. Define a constant for that value.

Then modify slub to keep MIN_PARTIAL slabs around.

This avoids bad situations where a function frees the last object
in a slab (which results in the page being returned to the page
allocator) only to then allocate one again (which requires getting
a page back from the page allocator if the partial list was empty).
Keeping a couple of slabs on the partial list reduces overhead.

Empty slabs are added to the end of the partial list to insure that
partially allocated slabs are consumed first (defragmentation).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 53e15af03b slub: validation of slabs (metadata and guard zones)
This enables validation of slab.  Validation means that all objects are
checked to see if there are redzone violations, if padding has been
overwritten or any pointers have been corrupted.  Also checks the consistency
of slab counters.

Validation enables the detection of metadata corruption without the kernel
having to execute code that actually uses (allocs/frees) and object.  It
allows one to make sure that the slab metainformation and the guard values
around an object have not been compromised.

A single slabcache can be checked by writing a 1 to the "validate" file.

i.e.

echo 1 >/sys/slab/kmalloc-128/validate

or use the slabinfo tool to check all slabs

slabinfo -v

Error messages will show up in the syslog.

Note that validation can only reach slabs that are on a list.  This means that
we are usually restricted to partial slabs and active slabs unless
SLAB_STORE_USER is active which will build a full slab list and allows
validation of slabs that are fully in use.  Booting with "slub_debug" set will
enable SLAB_STORE_USER and then full diagnostic are available.

Note that we attempt to push cpu slabs back to the lists when we start the
check.  If the cpu slab is reactivated before we get to it (another processor
grabs it before we get to it) then it cannot be checked.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 643b113849 slub: enable tracking of full slabs
If slab tracking is on then build a list of full slabs so that we can verify
the integrity of all slabs and are also able to built list of alloc/free
callers.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 77c5e2d01a slub: fix object tracking
Object tracking did not work the right way for several call chains. Fix this up
by adding a new parameter to slub_alloc and slub_free that specifies the
caller address explicitly.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter b49af68ff9 Add virt_to_head_page and consolidate code in slab and slub
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:54 -07:00
Christoph Lameter 6d7779538f mm: optimize compound_head() by avoiding a shared page flag
The patch adds PageTail(page) and PageHead(page) to check if a page is the
head or the tail of a compound page.  This is done by masking the two bits
describing the state of a compound page and then comparing them.  So one
comparision and a branch instead of two bit checks and two branches.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter d85f33855c Make page->private usable in compound pages
If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages.  Right now we have to be careful f.e.
 if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field.  This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct.  I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads.  This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
be reclaimed.  Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter 3052086483 PowerPC: Disable SLUB for configurations in which slab page structs are modified
PowerPC uses the slab allocator to manage the lowest level of the page
table.  In high cpu configurations we also use the page struct to split the
page table lock.  Disallow the selection of SLUB for that case.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter 614410d589 SLUB: allocate smallest object size if the user asks for 0 bytes
Makes SLUB behave like SLAB in this area to avoid issues....

Throw a stack dump to alert people.

At some point the behavior should be switched back.  NULL is no memory as
far as I can tell and if the use asked for 0 bytes then he need to get no
memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter 47bfdc0d5a SLUB: change default alignments
Structures may contain u64 items on 32 bit platforms that are only able to
address 64 bit items on 64 bit boundaries.  Change the mininum alignment of
slabs to conform to those expectations.

ARCH_KMALLOC_MINALIGN must be changed for good since a variety of structure
are mixed in the general slabs.

ARCH_SLAB_MINALIGN is changed because currently there is no consistent
specification of object alignment.  We may have that in the future when the
KMEM_CACHE and related macros are used to generate slabs.  These pass the
alignment of the structure generated by the compiler to the slab.

With KMEM_CACHE etc we could align structures that do not contain 64
bit values to 32 bit boundaries potentially saving some memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter 81819f0fc8 SLUB core
This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.

A. Management of object queues

   A particular concern was the complex management of the numerous object
   queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
   each allocating CPU and use objects from a slab directly instead of
   queueing them up.

B. Storage overhead of object queues

   SLAB Object queues exist per node, per CPU. The alien cache queue even
   has a queue array that contain a queue for each processor on each
   node. For very large systems the number of queues and the number of
   objects that may be caught in those queues grows exponentially. On our
   systems with 1k nodes / processors we have several gigabytes just tied up
   for storing references to objects for those queues  This does not include
   the objects that could be on those queues. One fears that the whole
   memory of the machine could one day be consumed by those queues.

C. SLAB meta data overhead

   SLAB has overhead at the beginning of each slab. This means that data
   cannot be naturally aligned at the beginning of a slab block. SLUB keeps
   all meta data in the corresponding page_struct. Objects can be naturally
   aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
   boundaries and can fit tightly into a 4k page with no bytes left over.
   SLAB cannot do this.

D. SLAB has a complex cache reaper

   SLUB does not need a cache reaper for UP systems. On SMP systems
   the per CPU slab may be pushed back into partial list but that
   operation is simple and does not require an iteration over a list
   of objects. SLAB expires per CPU, shared and alien object queues
   during cache reaping which may cause strange hold offs.

E. SLAB has complex NUMA policy layer support

   SLUB pushes NUMA policy handling into the page allocator. This means that
   allocation is coarser (SLUB does interleave on a page level) but that
   situation was also present before 2.6.13. SLABs application of
   policies to individual slab objects allocated in SLAB is
   certainly a performance concern due to the frequent references to
   memory policies which may lead a sequence of objects to come from
   one node after another. SLUB will get a slab full of objects
   from one node and then will switch to the next.

F. Reduction of the size of partial slab lists

   SLAB has per node partial lists. This means that over time a large
   number of partial slabs may accumulate on those lists. These can
   only be reused if allocator occur on specific nodes. SLUB has a global
   pool of partial slabs and will consume slabs from that pool to
   decrease fragmentation.

G. Tunables

   SLAB has sophisticated tuning abilities for each slab cache. One can
   manipulate the queue sizes in detail. However, filling the queues still
   requires the uses of the spin lock to check out slabs. SLUB has a global
   parameter (min_slab_order) for tuning. Increasing the minimum slab
   order can decrease the locking overhead. The bigger the slab order the
   less motions of pages between per CPU and partial lists occur and the
   better SLUB will be scaling.

G. Slab merging

   We often have slab caches with similar parameters. SLUB detects those
   on boot up and merges them into the corresponding general caches. This
   leads to more effective memory use. About 50% of all caches can
   be eliminated through slab merging. This will also decrease
   slab fragmentation because partial allocated slabs can be filled
   up again. Slab merging can be switched off by specifying
   slub_nomerge on boot up.

   Note that merging can expose heretofore unknown bugs in the kernel
   because corrupted objects may now be placed differently and corrupt
   differing neighboring objects. Enable sanity checks to find those.

H. Diagnostics

   The current slab diagnostics are difficult to use and require a
   recompilation of the kernel. SLUB contains debugging code that
   is always available (but is kept out of the hot code paths).
   SLUB diagnostics can be enabled via the "slab_debug" option.
   Parameters can be specified to select a single or a group of
   slab caches for diagnostics. This means that the system is running
   with the usual performance and it is much more likely that
   race conditions can be reproduced.

I. Resiliency

   If basic sanity checks are on then SLUB is capable of detecting
   common error conditions and recover as best as possible to allow the
   system to continue.

J. Tracing

   Tracing can be enabled via the slab_debug=T,<slabcache> option
   during boot. SLUB will then protocol all actions on that slabcache
   and dump the object contents on free.

K. On demand DMA cache creation.

   Generally DMA caches are not needed. If a kmalloc is used with
   __GFP_DMA then just create this single slabcache that is needed.
   For systems that have no ZONE_DMA requirement the support is
   completely eliminated.

L. Performance increase

   Some benchmarks have shown speed improvements on kernbench in the
   range of 5-10%. The locking overhead of slub is based on the
   underlying base allocation size. If we can reliably allocate
   larger order pages then it is possible to increase slub
   performance much further. The anti-fragmentation patches may
   enable further performance increases.

Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

SLUB Boot options

slub_nomerge		Disable merging of slabs
slub_min_order=x	Require a minimum order for slab caches. This
			increases the managed chunk size and therefore
			reduces meta data and locking overhead.
slub_min_objects=x	Mininum objects per slab. Default is 8.
slub_max_order=x	Avoid generating slabs larger than order specified.
slub_debug		Enable all diagnostics for all caches
slub_debug=<options>	Enable selective options for all caches
slub_debug=<o>,<cache>	Enable selective options for a certain set of
			caches

Available Debug options
F		Double Free checking, sanity and resiliency
R		Red zoning
P		Object / padding poisoning
U		Track last free / alloc
T		Trace all allocs / frees (only use for individual slabs).

To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.

[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Andy Whitcroft 543691a6cd tty_register_driver: only allocate tty instances when defined
If device->num is zero we attempt to kmalloc() zero bytes.  When SLUB is
enabled this returns a null pointer and take that as an allocation failure
and fail the device register.  Check for no devices and avoid the
allocation.

[akpm: opportunistic kzalloc() conversion]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter b5637e65ee i386: use page allocator to allocate thread_info structure
i386 uses kmalloc to allocate the threadinfo structure assuming that the
allocations result in a page sized aligned allocation.  That has worked so
far because SLAB exempts page sized slabs from debugging and aligns them in
special ways that goes beyond the restrictions imposed by
KMALLOC_ARCH_MINALIGN valid for other slabs in the kmalloc array.

SLUB also works fine without debugging since page sized allocations neatly
align at page boundaries.  However, if debugging is switched on then SLUB
will extend the slab with debug information.  The resulting slab is not
longer of page size.  It will only be aligned following the requirements
imposed by KMALLOC_ARCH_MINALIGN.  As a result the threadinfo structure may
not be page aligned which makes i386 fail to boot with SLUB debug on.

Replace the calls to kmalloc with calls into the page allocator.

An alternate solution may be to create a custom slab cache where the
alignment is set to PAGE_SIZE.  That would allow slub debugging to be
applied to the threadinfo structure.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
David Rientjes c596d9f320 cpusets: allow TIF_MEMDIE threads to allocate anywhere
OOM killed tasks have access to memory reserves as specified by the
TIF_MEMDIE flag in the hopes that it will quickly exit.  If such a task has
memory allocations constrained by cpusets, we may encounter a deadlock if a
blocking task cannot exit because it cannot allocate the necessary memory.

We allow tasks that have the TIF_MEMDIE flag to allocate memory anywhere,
including outside its cpuset restriction, so that it can quickly die
regardless of whether it is __GFP_HARDWALL.

Cc: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Andrew Morton a3a02be791 slab: mark set_up_list3s() __init
It is only ever used prior to free_initmem().

(It will cause a warning when we run the section checking, but that's a
false-positive and it simply changes the source of an existing warning, which
is also a false-positive)

Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00