Commit Graph

36137 Commits

Author SHA1 Message Date
Mel Gorman c713216dee [PATCH] Introduce mechanism for registering active regions of memory
At a basic level, architectures define structures to record where active
ranges of page frames are located.  Once located, the code to calculate zone
sizes and holes in each architecture is very similar.  Some of this zone and
hole sizing code is difficult to read for no good reason.  This set of patches
eliminates the similar-looking architecture-specific code.

The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range().  When all areas have
been discovered, free_area_init_nodes() is called to initialise the pgdat and
zones.  The zone sizes and holes are then calculated in an architecture
independent manner.

Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
	It adjusts the watermarks slightly

Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig.  Bob Picco has also tested and debugged on
IA64.  Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine.  These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.

There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory holes
but the architecture-independent code accounts the memory as present.

The big benefit of this set of patches is a sizable reduction of
architecture-specific code, some of which is very hairy.  There should be a
greater reduction when other architectures use the same mechanisms for zone
and hole sizing but I lack the hardware to test on.

Additional credit;
	Dave Hansen for the initial suggestion and comments on early patches
	Andy Whitcroft for reviewing early versions and catching numerous
		errors
	Tony Luck for testing and debugging on IA64
	Bob Picco for fixing bugs related to pfn registration, reviewing a
		number of patch revisions, providing a number of suggestions
		on future direction and testing heavily
	Jack Steiner and Robin Holt for testing on IA64 and clarifying
		issues related to memory holes
	Yasunori for testing on IA64
	Andi Kleen for reviewing and feeding back about x86_64
	Christian Kujau for providing valuable information related to ACPI
		problems on x86_64 and testing potential fixes

This patch:

Define the structure to represent an active range of page frames within a node
in an architecture independent manner.  Architectures are expected to register
active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
free_area_init_nodes() passing the PFNs of the end of each zone.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Andrew Morton 2bd0cfbde2 [PATCH] fix x86_64-mm-spinlock-cleanup
We need processor.h for cpu_relax().

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Alexey Dobriyan 133d205a18 [PATCH] Make kmem_cache_destroy() return void
un-, de-, -free, -destroy, -exit, etc functions should in general return
void.  Also,

There is very little, say, filesystem driver code can do upon failed
kmem_cache_destroy().  If it will be decided to BUG in this case, BUG
should be put in generic code, instead.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Alexey Dobriyan 1a1d92c10d [PATCH] Really ignore kmem_cache_destroy return value
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

	(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
  the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
  low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Panagiotis Issaris f52720ca5f [PATCH] fs: Removing useless casts
* Removing useless casts
* Removing useless wrapper
* Conversion from kmalloc+memset to kzalloc

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Panagiotis Issaris f8314dc60c [PATCH] fs: Conversions from kmalloc+memset to k(z|c)alloc
Conversions from kmalloc+memset to kzalloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Eric Sandeen 32c2d2bc4b [PATCH] more ext3 16T overflow fixes
Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
if they overflow they'll then underflow and things are fine.

5th hunk actually fixes an overflow problem.

Also check for potential overflows in inode & block counts when resizing.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Dave Kleikamp a4e4de36dc [PATCH] ext3: Fix sparse warnings
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Dave Kleikamp e9ad5620bf [PATCH] ext3: More whitespace cleanups
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Vasily Averin 7543fc7b3a [PATCH] ext3: wrong error behavior
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
behavior was broken in linux kernels since 2.5.x versions by the following
patch:

2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

In case ext3 file system is mounted with errors=continue
(EXT3_ERRORS_CONTINUE) errors should be ignored when possible.  However at
present in case of any error kernel aborts journal and remounts filesystem
to read-only.  Such behavior was hit number of times and noted to differ
from that of 2.4.x kernels.

This patch fixes this:
- do nothing in case of EXT3_ERRORS_CONTINUE,
- set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
- panic() should be called after ext3_commit_super() to save
 sb marked as EXT3_ERROR_FS

Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@sw.ru>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Mingming Cao 36faadc144 [PATCH] ext3: more comments about block allocation/reservation code
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Mingming Cao 321fb9e818 [PATCH] ext3: turn on reservation dump on block allocation errors
In the past there were a few kernel panics related to block reservation
tree operations failure (insert/remove etc).  It would be very useful to
get the block allocation reservation map info when such error happens.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Eric Sandeen 37ed322290 [PATCH] JBD: 16T fixes
These are a few places I've found in jbd that look like they may not be
16T-safe, or consistent with the use of unsigned longs for block
containers.  Problems here would be somewhat hard to hit, would require
journal blocks past the 8T boundary, which would not be terribly common.
Still, should fix.

(some of these have come from the ext4 work on jbd as well).

I think there's one more possibility that the wrap() function may not be
safe IF your last block in the journal butts right up against the 232 block
boundary, but that seems like a VERY remote possibility, and I'm not
worrying about it at this point.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Eric Sandeen eee194e76c [PATCH] ext3: inode numbers are unsigned long
This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Eric Sandeen 41f04d852e [PATCH] ext2: fix mounts at 16T
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Eric Sandeen 855565e81a [PATCH] fix ext3 mounts at 16T
I need to do some actual IO testing now, but this gets things mounting for
a 16T ext3 filesystem.  (patched up e2fsprogs is needed too, I'll send that
off the kernel list)

This patch fixes these issues in the kernel:

o sbi->s_groups_count overflows in ext3_fill_super()

	sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
			       le32_to_cpu(es->s_first_data_block) +
			       EXT3_BLOCKS_PER_GROUP(sb) - 1) /
			      EXT3_BLOCKS_PER_GROUP(sb);

  at 16T, s_blocks_count is already maxed out; adding
  EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
  Not really what we want, and causes a failed mount.

  Feel free to check my math (actually, please do!), but changing it this
  way should work & avoid the overflow:

  (A + B - 1)/B changed to: ((A - 1)/B) + 1

o ext3_check_descriptors() overflows range checks

  ext3_check_descriptors() iterates over all block groups making sure
  that various bits are within the right block ranges...  on the last pass
  through, it is checking the error case

   [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

  where "block" is the first block in the last block group.  The last
  block in this group (and the last one that will fit in 32 bits) is block
  + EXT3_BLOCKS_PER_GROUP(sb)- 1.  block + EXT3_BLOCKS_PER_GROUP(sb) wraps
  back around to 0.

  so, make things clearer with "first_block" and "last_block" where those
  are first and last, inclusive, and use <, > rather than <, >=.

  Finally, the last block group may be smaller than the rest, so account
  for this on the last pass through: last_block = sb->s_blocks_count - 1;

(a similar patch could be done for ext2; does anyone in their right mind
use ext2 at 16T?  I'll send an ext2 patch doing the same thing if that's
warranted)

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Alexey Dobriyan 2aed348469 [PATCH] jbd: use BUILD_BUG_ON in journal init
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Mingming Cao ae6ddcc5f2 [PATCH] ext3 and jbd cleanup: remove whitespace
Remove whitespace from ext3 and jbd, before we clone ext4.

Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:09 -07:00
Josh Triplett e7ab8d6505 [PATCH] jbd: add lock annotation to jbd_sync_bh
jbd_sync_bh releases journal->j_list_lock.  Add a lock annotation to this
function so that sparse can check callers for lock pairing, and so that
sparse will not complain about this function since it intentionally uses
the lock in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:08 -07:00
KAMEZAWA Hiroyuki bbf2bef9f5 [PATCH] fix "cpu to node relationship fixup: map cpu to node"
Fix build error introduced by 3212fe1594

Non-NUMA case should be handled.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:08 -07:00
Linus Torvalds a5b08073a0 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6: (30 commits)
  i2c: Drop unimplemented slave functions
  i2c: Constify i2c_algorithm declarations, part 2
  i2c: Constify i2c_algorithm declarations, part 1
  i2c: Let drivers constify i2c_algorithm data
  i2c-isa: Restore driver owner
  i2c-viapro: Add support for the VT8237A and VT8251
  i2c: Warn on i2c client creation failure
  i2c-core: Drop useless bitmaskings
  i2c-algo-pcf: Discard the mdelay data struct member
  i2c-algo-bit: Cleanups
  i2c-isa: Fail adding driver on attach_adapter error
  i2c: __must_check fixes (chip drivers)
  i2c-dev: attach/detach_adapter cleanups
  i2c-stub: Chip address as a module parameter
  i2c: Plan i2c-isa for removal
  i2c: New bus driver for TI OMAP boards
  i2c-algo-bit: Discard the mdelay data struct member
  i2c-matroxfb: Struct init conversion
  i2c: Fix copy-n-paste in subsystem Kconfig
  i2c-au1550: Add I2C support for Au1200
  ...
2006-09-27 08:09:48 -07:00
Linus Torvalds ff0972c26b Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (28 commits)
  pciehp - fix wrong return value
  IA64: PCI: dont disable irq which is not enabled
  acpiphp: add support for ioapic hot-remove
  PCI: assign ioapic resource at hotplug
  acpiphp: disable bridges
  acpiphp: stop bus device before acpi_bus_trim
  PCI: add pci_stop_bus_device
  acpiphp: do not initialize existing ioapics
  acpiphp: initialize ioapics before starting devices
  acpiphp: set hpp values before starting devices
  PCI Hotplug: cleanup pcihp skeleton code.
  PCI: Restore PCI Express capability registers after PM event
  PCI: drivers/pci/hotplug/acpiphp_glue.c: make a function static
  PCI: Multiprobe sanitizer
  PCI: fix __must_check warnings
  PCI Hotplug: fix __must_check warnings
  SHPCHP: fix __must_check warnings
  PCI-Express AER implemetation: pcie_portdrv error handler
  PCI-Express AER implemetation: AER core and aerdriver
  PCI-Express AER implemetation: export pcie_port_bus_type
  ...
2006-09-27 08:09:15 -07:00
Franck Bui-Huu a09fc446fb [MIPS] setup.c: use early_param() for early command line parsing
There's no point to rewrite some logic to parse command line
to pass initrd parameters or to declare a user memory area.
We could use instead parse_early_param() that does the same
thing.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-09-27 13:38:04 +01:00
Franck Bui-Huu 1c6fd44d7e [MIPS] setup.c: remove MAXMEM macro
It doesn't improve readability.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-09-27 13:38:02 +01:00
Franck Bui-Huu 8df32c636e [MIPS] setup.c: do not inline functions
There's no point to inline any functions in setup.c. Let's GCC
doing its job, it's good enough for that now.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-09-27 13:38:01 +01:00