Commit Graph

7715 Commits

Author SHA1 Message Date
Nick Piggin
612d6c19db [PATCH] radix-tree: direct data
The ability to have height 0 radix trees (a direct pointer to the data item
rather than going through a full node->slot) quietly disappeared with
old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd.  On 64-bit
machines this causes nearly 600 bytes to be used for every <= 4K file in
pagecache.

Re-introduce this feature, root tags stored in spare ->gfp_mask bits.

Simplify radix_tree_delete's complex tag clearing arrangement (which would
become even more complex) by just falling back to tag clearing functions
(the pagecache radix-tree never uses this path anyway, so the icache
savings will mean it's actually a speedup).

On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
pagecache.

Pagecache lookup, insertion, and removal speed for small files will also be
improved.

This makes RCU radix tree harder, but it's worth it.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
Dean Nelson
929f97276b [PATCH] change gen_pool allocator to not touch managed memory
Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme
instead of the buddy scheme.  The purpose of this change is to eliminate
the touching of the actual memory being allocated.

Since the change modifies the interface, a change to the uncached allocator
(arch/ia64/kernel/uncached.c) is also required.

Both Andrey Volkov and Jes Sorenson have expressed a desire that the
gen_pool allocator not write to the memory being managed. See the
following:

  http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2
  http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2

Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Andrey Volkov <avolkov@varma-el.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
Nick Piggin
833423143c [PATCH] mm: introduce remap_vmalloc_range()
Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers
can have a nice interface for remapping vmalloc memory.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
Yasunori Goto
762834e8bf [PATCH] Unify pxm_to_node() and node_to_pxm()
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
Chen, Kenneth W
a43a8c39bb [PATCH] tightening hugetlb strict accounting
Current hugetlb strict accounting for shared mapping always assume mapping
starts at zero file offset and reserves pages between zero and size of the
file.  This assumption often reserves (or lock down) a lot more pages then
necessary if application maps at none zero file offset.  libhugetlbfs is
one example that requires proper reservation on shared mapping starts at
none zero offset.

This patch extends the reservation and hugetlb strict accounting to support
any arbitrary pair of (offset, len), resulting a much more robust and
accurate scheme.  More importantly, it won't lock down any hugetlb pages
outside file mapping.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
Andreas Dilger
e8f03d0208 [PATCH] reserve space for swap label
Reserve space in the swap disk header for a LABEL and UUID to be specified.
 This has been possible with util-linux-2.12b (via e2fsprogs 1.36
libblkid), and is used by at least FC3 and later.  The kernel doesn't
really care about this, but the space shouldn't accidentally be used by
something else either.

Also make the on-disk structures be fixed-size types, instead of "int",
though I don't know of any architecture in use where an "int" isn't the
same size as a "__u32" (all current kernel arches have it as "unsigned
int").

Signed-off-by: Andreas Dilger <adilger@shaw.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:47 -07:00
KAMEZAWA Hiroyuki
fadd8fbd15 [PATCH] support for panic at OOM
This patch adds panic_on_oom sysctl under sys.vm.

When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
processes.  And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
the same way as it does today.  Of course, the default value is 0 and only
root can modifies it.

In general, oom_killer works well and kill rogue processes.  So the whole
system can survive.  But there are environments where panic is preferable
rather than kill some processes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:47 -07:00
Andy Whitcroft
67de648211 [PATCH] squash duplicate page_to_pfn and pfn_to_page
We have architectures where the size of page_to_pfn and pfn_to_page are
significant enough to overall image size that they wish to push them out of
line.  However, in the process we have grown a second copy of the
implementation of each of these routines for each memory model.  Share the
implmentation exposing it either inline or out-of-line as required.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:47 -07:00
Yasunori Goto
718127cc31 [PATCH] wait_table and zonelist initializing for memory hotadd: add return code for init_current_empty_zone
When add_zone() is called against empty zone (not populated zone), we have to
initialize the zone which didn't initialize at boot time.  But,
init_currently_empty_zone() may fail due to allocation of wait table.  So,
this patch is to catch its error code.

Changes against wait_table is in the next patch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:46 -07:00
Yasunori Goto
86356ab147 [PATCH] wait_table and zonelist initializing for memory hotadd: change to meminit for build_zonelist
Change definitions of some functions and data from __init to __meminit.

These functions and data can be used after bootup by this patch to be used for
hot-add codes.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:46 -07:00
Yasunori Goto
02b694dea4 [PATCH] wait_table and zonelist initializing for memory hotadd: change name of wait_table_size()
This is just to rename from wait_table_size() to wait_table_hash_nr_entries().

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:46 -07:00
Andrew Morton
f886ed443f [PATCH] PG_uncached is ia64 only
As Nick points out, only ia64 uses PG_uncached.  So we can push it up into the
higher bits of the lower half of page->flags and make room for another flag on
32-bit machines.

Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@sgi.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:46 -07:00
Andy Whitcroft
cb2b95e1c6 [PATCH] zone handle unaligned zone boundaries
The buddy allocator has a requirement that boundaries between contigious
zones occur aligned with the the MAX_ORDER ranges.  Where they do not we
will incorrectly merge pages cross zone boundaries.  This can lead to pages
from the wrong zone being handed out.

Originally the buddy allocator would check that buddies were in the same
zone by referencing the zone start and end page frame numbers.  This was
removed as it became very expensive and the buddy allocator already made
the assumption that zones boundaries were aligned.

It is clear that not all configurations and architectures are honouring
this alignment requirement.  Therefore it seems safest to reintroduce
support for non-aligned zone boundaries.  This patch introduces a new check
when considering a page a buddy it compares the zone_table index for the
two pages and refuses to merge the pages where they do not match.  The
zone_table index is unique for each node/zone combination when
FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when
SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size).

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Randy Dunlap
f4b8ea7849 [NET]: fix net-core kernel-doc
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:42 -07:00
David Woodhouse
c8a553ad7f [TCP]: Move inclusion of <linux/dmaengine.h> to correct place in <linux/tcp.h>
The new <linux/dmaengine.h> header shouldn't be included from
the !__KERNEL__ portion of tcp.h

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:40 -07:00
Herbert Xu
37c3185a02 [NET]: Added GSO toggle
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device.  For now it only supports in TCPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:36 -07:00
Herbert Xu
f4c50d990d [NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:33 -07:00
Herbert Xu
f6a78bfcb1 [NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.

The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:31 -07:00
Herbert Xu
7967168cef [NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:29 -07:00
Herbert Xu
5b057c6b1a [NET]: Avoid allocating skb in skb_pad
First of all it is unnecessary to allocate a new skb in skb_pad since
the existing one is not shared.  More importantly, our hard_start_xmit
interface does not allow a new skb to be allocated since that breaks
requeueing.

This patch uses pskb_expand_head to expand the existing skb and linearize
it if needed.  Actually, someone should sift through every instance of
skb_pad on a non-linear skb as they do not fit the reasons why this was
originally created.

Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
TCP, etc.).  As it is skb_pad will simply write over a cloned skb.  Because
of the position of the write it is unlikely to cause problems but still
it's best if we don't do it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:06:41 -07:00
Linus Torvalds
6f3cafce0e Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits)
  [ARM] 3629/1: S3C24XX: fix missing bracket in regs-dsc.h
  [ARM] 3537/1: Rework DMA-bounce locking for finer granularity
  [ARM] 3601/1: i.MX/MX1 DMA error handling for signaled channels only
  [ARM] 3597/1: ixp4xx/nslu2: Board support for new LED subsystem
  [ARM] 3595/1: ixp4xx/nas100d: Board support for new LED subsystem
  [ARM] 3626/1: ARM EABI: fix syscall restarting
  [ARM] 3628/1: S3C24XX: add get_rate call to struct clk
  [ARM] 3627/1: S3C24XX: split s3c2410 clocks from core clocks
  [ARM] 3613/1: S3C2410: Add sysdev and sysclass
  [ARM] 3624/1: Report true modem control line states
  [ARM] 3620/2: ixp23xx: add uengine loader support
  [ARM] 3618/1: add defconfig for logicpd pxa270 card engine
  [ARM] 3617/1: ep93xx: fix slightly incorrect timer tick rate
  [ARM] 3616/1: fix timer handler wrap logic for a number of platforms
  [ARM] 3615/1: ixp23xx: use platform devices for physmap flash
  [ARM] 3614/1: ep93xx: use platform devices for physmap flash
  [ARM] 3621/1: fix compilation breakage for pnx4008
  [ARM] 3623/1: pnx4008: move GPIO-related defines to gpio.h
  [ARM] 3622/1: pnx4008: remove clk_use/clk_unuse
  [ARM] Enable VFP to be built when non-VFP capable CPUs are selected
  ...
2006-06-22 22:46:28 -07:00
Linus Torvalds
065a3e17ba Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (33 commits)
  [PATCH] myri10ge - drop workaround pci_save_state() disabling MSI
  [PATCH] myri10ge - drop workaround for the missing AER ext cap on nVidia CK804
  via-velocity: the link is not correctly detected when the device starts
  [PATCH] add b44 to maintainers
  [PATCH] WAN: ioremap() failure checks in drivers
  [PATCH] WAN: register_hdlc_device() doesn't need dev_alloc_name()
  [PATCH] skb_padto()-area fixes in 8390, wavelan
  [PATCH] make drivers/net/forcedeth.c:nv_update_pause() static
  [PATCH] network driver for Hilscher netx
  [PATCH] Dereference in tokenring/olympic.c
  [PATCH] Array overrun in drivers/net/wireless/wavelan.c
  [PATCH] Remove useless check in drivers/net/pcmcia/xirc2ps_cs.c
  [PATCH] 8139cp: add ethtool eeprom support
  [PATCH] 8139cp: fix eeprom read command length
  [PATCH] b44: update b44 Kconfig entry
  [PATCH] b44: update version to 1.01
  [PATCH] b44: add wol for old nic
  [PATCH] b44: add parameter
  [PATCH] b44: add wol
  [PATCH] b44: fix manual speed/duplex/autoneg settings
  ...
2006-06-22 22:15:09 -07:00
Linus Torvalds
45c091bb2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
  [POWERPC] re-enable OProfile for iSeries, using timer interrupt
  [POWERPC] support ibm,extended-*-frequency properties
  [POWERPC] Extra sanity check in EEH code
  [POWERPC] Dont look for class-code in pci children
  [POWERPC] Fix mdelay badness on shared processor partitions
  [POWERPC] disable floating point exceptions for init
  [POWERPC] Unify ppc syscall tables
  [POWERPC] mpic: add support for serial mode interrupts
  [POWERPC] pseries: Print PCI slot location code on failure
  [POWERPC] spufs: one more fix for 64k pages
  [POWERPC] spufs: fail spu_create with invalid flags
  [POWERPC] spufs: clear class2 interrupt status before wakeup
  [POWERPC] spufs: fix Makefile for "make clean"
  [POWERPC] spufs: remove stop_code from struct spu
  [POWERPC] spufs: fix spu irq affinity setting
  [POWERPC] spufs: further abstract priv1 register access
  [POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
  [POWERPC] spufs: dont try to access SPE channel 1 count
  [POWERPC] spufs: use kzalloc in create_spu
  [POWERPC] spufs: fix initial state of wbox file
  ...

Manually resolved conflicts in:
	drivers/net/phy/Makefile
	include/asm-powerpc/spu.h
2006-06-22 22:11:30 -07:00