Commit Graph

215 Commits

Author SHA1 Message Date
David Gibson
63551ae0fe [PATCH] Hugepage consolidation
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code.  It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.

Tested, at least a little, on ppc64, i386 and x86_64.

Notes:
	- this patch changes the meaning of set_huge_pte() to be more
	  analagous to set_pte()
	- does SH4 need s special huge_ptep_get_and_clear()??

Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:15 -07:00
Martin Hicks
1e7e5a9048 [PATCH] VM: rate limit early reclaim
When early zone reclaim is turned on the LRU is scanned more frequently when a
zone is low on memory.  This limits when the zone reclaim can be called by
skipping the scan if another thread (either via kswapd or sync reclaim) is
already reclaiming from the zone.

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Martin Hicks
0c35bbadc5 [PATCH] VM: add __GFP_NORECLAIM
When using the early zone reclaim, it was noticed that allocating new pages
that should be spread across the whole system caused eviction of local pages.

This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts.  The example that is implemented here is for page
cache pages.  We want page cache pages to be spread across the whole system,
and we don't want page cache pages to evict other pages to get local memory.

Signed-off-by:  Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Martin Hicks
753ee72896 [PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Ingo Molnar
39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
Patrick McHardy
18b8afc771 [NETFILTER]: Kill nf_debug
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:57 -07:00
Patrick McHardy
e45b1be8bc [NETFILTER]: Kill lockhelp.h
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:30 -07:00
Alexey Kuznetsov
18b504e25f [NETLINK]: netlink_callback structure needs 5 args not 4
net/ipv4/tcp_diag.c uses up to ->args[4]

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 12:38:48 -07:00
Linus Torvalds
1d345dac1f Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2005-06-20 16:00:33 -07:00
Maneesh Soni
988d186de5 [PATCH] sysfs-iattr: add sysfs_setattr
o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
  attribues are saved in the persistent sysfs_dirent structure as a pointer
  to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
  for which default attributes are getting changed. Thanks to Jon Smirl for
  this suggestion.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:37 -07:00
Yani Ioannou
0a3e7eeabc [PATCH] I2C: add i2c sensor_device_attribute and macros
This patch creates a new header with a potential standard i2c sensor
attribute type (which simply includes an int representing the sensor
number/index) and the associated macros, SENSOR_DEVICE_ATTR to define
a static attribute and to_sensor_dev_attr to get a
sensor_device_attribute reference from an embedded device_attribute
reference.

Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
2005-06-20 15:15:36 -07:00
Yani Ioannou
54b6f35c99 [PATCH] Driver core: change device_attribute callbacks
This patch adds the device_attribute paramerter to the
device_attribute store and show sysfs callback functions, and passes a
reference to the attribute when the callbacks are called.

Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:31 -07:00
Arnd Bergmann
acaefc25d2 [PATCH] libfs: add simple attribute files
Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.

Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.

These operations are defined as

void foo_set(void *data, u64 val); and
u64 foo_get(void *data);

where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.

A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.

This patch already contains the changes for debugfs to use attributes
for its internal file operations.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:30 -07:00
Keiichiro Tokunaga
4b45099b75 [PATCH] Driver core: unregister_node() for hotplug use
This adds a generic function 'unregister_node()'.
It is used to remove objects of a node going away
for hotplug.  All the devices on the node must be
unregistered before calling this function.

Signed-off-by: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -puN drivers/base/node.c~numa_hp_base drivers/base/node.c
2005-06-20 15:15:29 -07:00
Patrick Mochel
0d3e5a2e39 [PATCH] Driver Core: fix bk-driver-core kills ppc64
There's no check to see if the device is already bound to a driver, which
could do bad things.  The first thing to go wrong is that it will try to match
a driver with a device already bound to one.  In some cases (it appears with
USB with drivers/usb/core/usb.c::usb_match_id()), some drivers will match a
device based on the class type, so it would be common (especially for HID
devices) to match a device that is already bound.

The fun comes when ->probe() is called, it fails, then
driver_probe_device() does this:

	dev->driver = NULL;

Later on, that pointer could be be dereferenced without checking and cause
hell to break loose.

This problem could be nasty. It's very hardware dependent, since some
devices could have a different set of matching qualifiers than others.

Now, I don't quite see exactly where/how you were getting that crash.
You're dereferencing bad memory, but I'm not sure which pointer was bad
and where it came from, but it could have come from a couple of different
places.

The patch below will hopefully fix it all up for you. It's against
2.6.12-rc2-mm1, and does the following:

- Move logic to driver_probe_device() and comments uncommon returns:
  1 - If device is bound
  0 - If device not bound, and no error
  error - If there was an error.

- Move locking to caller of that function, since we want to lock a
  device for the entire time we're trying to bind it to a driver (to
  prevent against a driver being loaded at the same time).

- Update __device_attach() and __driver_attach() to do that locking.

- Check if device is already bound in __driver_attach()

- Update the converse device_release_driver() so it locks the device
  around all of the operations.

- Mark driver_probe_device() as static and remove export. It's an
  internal function, it should stay that way, and there are no other
  callers. If there is ever a need to export it, we can audit it as
  necessary.

Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-06-20 15:15:27 -07:00
mochel@digitalimplant.org
36239577cf [PATCH] Use a klist for device child lists.
- Use klist iterator in device_for_each_child(), making it safe to use for
  removing devices.
- Remove unused list_to_dev() function.
- Kills all usage of devices_subsys.rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:23 -07:00
mochel@digitalimplant.org
63c4f204ff [PATCH] Remove struct device::driver_list.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:18 -07:00
mochel@digitalimplant.org
7dc35cdf69 [PATCH] Remove struct device::bus_list.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:18 -07:00
mochel@digitalimplant.org
8b0c250be4 [PATCH] add klist_node_attached() to determine if a node is on a list or not.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20 15:15:17 -07:00
mochel@digitalimplant.org
cb85b6f1cc [PATCH] Remove the unused device_find().
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:16 -07:00
mochel@digitalimplant.org
94e7b1c5ff [PATCH] Add a klist to struct device_driver for the devices bound to it.
- Use it in driver_for_each_device() instead of the regular list_head and stop using
  the bus's rwsem for protection.
- Use driver_for_each_device() in driver_detach() so we don't deadlock on the
  bus's rwsem.
- Remove ->devices.
- Move klist access and sysfs link access out from under device's semaphore, since
  they're synchronized through other means.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:16 -07:00
mochel@digitalimplant.org
38fdac3cdc [PATCH] Add a klist to struct bus_type for its drivers.
- Use it in bus_for_each_drv().
- Use the klist spinlock instead of the bus rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org
465c7a3a3a [PATCH] Add a klist to struct bus_type for its devices.
- Use it for bus_for_each_dev().
- Use the klist spinlock instead of the bus rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org
9a19fea436 [PATCH] Add initial implementation of klist helpers.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.

The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.

It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.

The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.

It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.

There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).

Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.

There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org
fae3cd0025 [PATCH] Add driver_for_each_device().
Now there's an iterator for accessing each device bound to a driver.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Index: linux-2.6.12-rc2/drivers/base/driver.c
===================================================================
2005-06-20 15:15:13 -07:00