Commit Graph

4824 Commits

Author SHA1 Message Date
Jeff Dike
ea1eae75eb [PATCH] uml: add __raw_writel definition
Add implementations of the write* and __raw_write* functions.  __raw_writel is
needed by lib/iocopy.c, which shouldn't be used in UML, but which is
unconditionally linked in anyway.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:18 -08:00
Christoph Lameter
dc85da15d4 [PATCH] NUMA policies in the slab allocator V2
This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
imbalance in memory allocation during bootup.

The slab allocator in 2.6.13 is not numa aware and simply calls
alloc_pages().  This means that memory policies may control the behavior of
alloc_pages().  During bootup the memory policy is set to MPOL_INTERLEAVE
resulting in the spreading out of allocations during bootup over all
available nodes.  The slab allocator in 2.6.13 has only a single list of
slab pages.  As a result the per cpu slab cache and the spinlock controlled
page lists may contain slab entries from off node memory.  The slab
allocator in 2.6.13 makes no effort to discern the locality of an entry on
its lists.

The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
explicitly by calling alloc_pages_node().  The NUMA slab allocator manages
slab entries by having lists of available slab pages for each node.  The
per cpu slab cache can only contain slab entries associated with the node
local to the processor.  This guarantees that the default allocation mode
of the slab allocator always assigns local memory if available.

Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
anymore.  In 2.6.14 all node unspecific slab allocations are performed on
the boot processor.  This means that most of key data structures are
allocated on one node.  Most processors will have to refer to these
structures making the boot node a potential bottleneck.  This may reduce
performance and cause unnecessary memory pressure on the boot node.

This patch implements NUMA policies in the slab layer.  There is the need
of explicit application of NUMA memory policies by the slab allcator itself
since the NUMA slab allocator does no longer let the page_allocator control
locality.

The check for policies is made directly at the beginning of __cache_alloc
using current->mempolicy.  The memory policy is already frequently checked
by the page allocator (alloc_page_vma() and alloc_page_current()).  So it
is highly likely that the cacheline is present.  For MPOL_INTERLEAVE
kmalloc() will spread out each request to one node after another so that an
equal distribution of allocations can be obtained during bootup.

It is not possible to push the policy check to lower layers of the NUMA
slab allocator since the per cpu caches are now only containing slab
entries from the current node.  If the policy says that the local node is
not to be preferred or forbidden then there is no point in checking the
slab cache or local list of slab pages.  The allocation better be directed
immediately to the lists containing slab entries for the allowed set of
nodes.

This way of applying policy also fixes another strange behavior in 2.6.13.
alloc_pages() is controlled by the memory allocation policy of the current
process.  It could therefore be that one process is running with
MPOL_INTERLEAVE and would f.e.  obtain a new page following that policy
since no slab entries are in the lists anymore.  A page can typically be
used for multiple slab entries but lets say that the current process is
only using one.  The other entries are then added to the slab lists.  These
are now non local entries in the slab lists despite of the possible
availability of local pages that would provide faster access and increase
the performance of the application.

Another process without MPOL_INTERLEAVE may now run and expect a local slab
entry from kmalloc().  However, there are still these free slab entries
from the off node page obtained from the other process via MPOL_INTERLEAVE
in the cache.  The process will then get an off node slab entry although
other slab entries may be available that are local to that process.  This
means that the policy if one process may contaminate the locality of the
slab caches for other processes.

This patch in effect insures that a per process policy is followed for the
allocation of slab entries and that there cannot be a memory policy
influence from one process to another.  A process with default policy will
always get a local slab entry if one is available.  And the process using
memory policies will get its memory arranged as requested.  Off-node slab
allocation will require the use of spinlocks and will make the use of per
cpu caches not possible.  A process using memory policies to redirect
allocations offnode will have to cope with additional lock overhead in
addition to the latency added by the need to access a remote slab entry.

Changes V1->V2
- Remove #ifdef CONFIG_NUMA by moving forward declaration into
  prior #ifdef CONFIG_NUMA section.

- Give the function determining the node number to use a saner
  name.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:18 -08:00
Christoph Lameter
1743660b91 [PATCH] Zone reclaim: proc override
proc support for zone reclaim

This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
used to override the automatic determination of the zone reclaim made on
bootup.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:17 -08:00
Christoph Lameter
9eeff2395e [PATCH] Zone reclaim: Reclaim logic
Some bits for zone reclaim exists in 2.6.15 but they are not usable.  This
patch fixes them up, removes unused code and makes zone reclaim usable.

Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available.  Zone reclaim is of particular importance for NUMA
machines.  It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch.  By default
RECLAIM_DISTANCE is 20.  20 is the distance to another node in the same
component (enclosure or motherboard) on IA64.  The meaning of the NUMA
distance information seems to vary by arch.

If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).

This patch was discussed before. See

http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:17 -08:00
Nick Piggin
053837fce7 [PATCH] mm: migration page refcounting fix
Migration code currently does not take a reference to target page
properly, so between unlocking the pte and trying to take a new
reference to the page with isolate_lru_page, anything could happen to
it.

Fix this by holding the pte lock until we get a chance to elevate the
refcount.

Other small cleanups while we're here.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:17 -08:00
Andrew Morton
c8d338c8db [PATCH] scsi_transport_spi build fix
On alpha:

In file included from drivers/scsi/sym53c8xx_2/sym_glue.h:59,
                 from drivers/scsi/sym53c8xx_2/sym_fw.c:40:
include/scsi/scsi_transport_spi.h:57: error: field `dv_mutex' has incomplete type

Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:16 -08:00
Linus Torvalds
2149bcabc5 Merge master.kernel.org:/home/rmk/linux-2.6-serial 2006-01-18 15:19:40 -08:00
Linus Torvalds
2333f21207 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2006-01-18 15:18:53 -08:00
Linus Torvalds
097916ecaf Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-01-18 15:08:16 -08:00
Russell King
37b797b270 Merge master.kernel.org:/pub/scm/linux/kernel/git/tmlind/linux-omap-upstream 2006-01-18 22:56:29 +00:00
Eddie C. Dost
c126cf80d4 [SPARC64]: Serial Console for E250 Patch
From: Eddie C. Dost <ecd@brainaid.de>

I have the following patch for serial console over the RSC
(remote system controller) on my E250 machine. It basically adds
support for input-device=rsc and output-device=rsc from OBP, and
allows 115200,8,n,1,- serial mode setting.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 14:54:31 -08:00
David Vrabel
68477d1176 [ARM] 3267/1: PXA27x SSP controller register defines
Patch from David Vrabel

PXA27x SSP controller has a few different registers, including SCR (serial clock rate) in SSCR0.

Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18 22:38:44 +00:00
David S. Miller
27a7b0415f Merge git://tipc.cslab.ericsson.net/pub/git/tipc 2006-01-18 14:23:54 -08:00
Alon Bar-Lev
d9004eb466 [SERIAL] Add 8250 support for Decision Computer International Co. PCCOM2
There is a new device which is look like:

	Serial controller: Decision Computer International Co. PCCOM2 (rev 02) (prog-if 02 [16550])
	0700: 6666:0004 (rev 02) (prog-if 02)
	Flags: medium devsel, IRQ 177
	Memory at fe000000 (32-bit, non-prefetchable) [size=128]
	I/O ports at e880 [size=128]
	I/O ports at e400 [size=256]

It has two 16550A, and is not listed in kernel, although the
manufacturer clams that it is supported...

I've created the following patch, it only add the new PCI id and the
card to the repository, it seems to work.

Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18 11:47:33 +00:00
Linus Torvalds
728c7763e7 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev 2006-01-17 19:47:31 -08:00
Linus Torvalds
d1138cf035 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2006-01-17 19:46:46 -08:00
Linus Torvalds
15578eeb6c Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-01-17 17:32:22 -08:00
Bryan O'Sullivan
c6b3feaf57 [PATCH] Fix sparse parse error in lppaca.h
sparse can't parse a struct definition in include/asm-powerpc/lppaca.h,
even though gcc can accept it.  The form looks like this:

        struct __attribute__((whatever)) foo { };

An equivalent that both gcc and sparse can handle is

        struct foo { } __attribute__((whatever));

This is the only definition of this type in the tree, and fixing it is
easier than fixing sparse.

Signed-off-by: Bryan O'Sullivan <bos@serpentine.com>
[ Side note: fixing sparse wouldn't be hard, but the "attribute at the
  end" version is the canonical one, and the one that makes sense. So
  let's just fix the kernel instead. Luc Van Oostenryck already sent
  out a sparse patch to the sparse mailing list in case anybody cares.
               -- Linus ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-17 17:18:25 -08:00
Alan Cox
8d238e0124 [PATCH] libata: Fix heuristic typos add LBA48PIO flag and support code, add IRQ flag for next diff
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-17 19:37:45 -05:00
Per Liden
33a9c4da5a [TIPC] Move ethernet protocol id to linux/if_ether.h
Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Per Liden
16cb4b333c [TIPC] Updated link priority macros
Added macros for min/default/max link priority in tipc_config.h.
Also renamed TIPC_NUM_LINK_PRI to TIPC_MEDIA_LINK_PRI since that
is a more accurate description of what it is used for.

Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Tony Lindgren
f07adc591e ARM: OMAP: 1/4 Fix clock framework to use clk_enable/disable
This patch fixes OMAP clock framework to use clk_enable/disable
instead of clk_use/unuse as specified in include/linux/clk.h.

Instances of clk_use/unuse are renamed to clk_enable/disable,
and references clk_use/unuse are removed.

Signed-off-by: Tony Lindgren <tony@atomide.com>
2006-01-17 15:27:09 -08:00
Alan Cox
1bc4ccfff8 [PATCH] libata: add a function to decide if we need iordy
This ought to be simple but for PIO2 we have to poke around the drive
data to get it 100% correct.

Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-17 08:25:39 -05:00
Jeff Garzik
0f1c122ecf Merge branch 'upstream-jgarzik' of git://git.tuxdriver.com/git/wireless-2.6 2006-01-17 07:22:26 -05:00
David S. Miller
8243126c5e [NET]: Make second arg to skb_reserved() signed.
Some subsystems, such as PPP, can send negative values
here.  It just happened to work correctly on 32-bit with
an unsigned value, but on 64-bit this explodes.

Figured out by Paul Mackerras based upon several PPP crash
reports.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:54:21 -08:00