Commit Graph

1092 Commits

Author SHA1 Message Date
Tejun Heo
cb98fc8bb9 [BLOCK] Reimplement elevator switch
This patch reimplements elevator switch.  This patch assumes generic
dispatch queue patchset is applied.

 * Each request is tagged with REQ_ELVPRIV flag if it has its elevator
   private data set.
 * Requests which doesn't have REQ_ELVPRIV flag set never enter
   iosched.  They are always directly back inserted to dispatch queue.
   Of course, elevator_put_req_fn is called only for requests which
   have its REQ_ELVPRIV set.
 * Request queue maintains the current number of requests which have
   its elevator data set (elevator_set_req_fn called) in
   q->rq->elvpriv.
 * If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
   is not allocated for new requests.

 To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS.  New implementation is much simpler and main code
paths are less cluttered, IMHO.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:48:12 +02:00
Tejun Heo
cb19833dcc [BLOCK] kill generic max_back_kb handling
This patch kills max_back_kb handling from elv_dispatch_sort() and
kills max_back_kb field from struct request_queue.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:46:01 +02:00
Tejun Heo
06b86245c0 [PATCH] 03/05 move last_merge handlin into generic elevator code
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge.  This
and the following patches move last_merge handling into
generic elevator code.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:45:20 +02:00
Jens Axboe
1b47f531e2 [PATCH] generic dispatch fixes
- Split elv_dispatch_insert() into two functions
- Rename rq_last_sector() to rq_end_sector()

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:44:37 +02:00
Tejun Heo
8922e16cf6 [PATCH] 01/05 Implement generic dispatch queue
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched.  This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:44:24 +02:00
Chen, Kenneth W
20e5c81fcf [patch] remove gendisk->stamp_idle field
struct gendisk has these two fields: stamp, stamp_idle.  Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp.  Suggest to remove it.

Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:15:30 +02:00
Justin Chen
551f8f0e87 [SERIAL] new hp diva console port
Add the new ID 0x132a and configure the new PCI Diva console port.  This
device supports only 1 single console UART.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-24 22:16:38 +01:00
Bjorn Helgaas
add7b58e75 [SERIAL] support the Exsys EX-4055 4S four-port card
Tested by Wolfgang Denk with this device:

    00:0f.0 Network controller: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 01)
        Subsystem: Exsys EX-4055 4S(16C550) RS-232
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at 80100000 (32-bit, non-prefetchable) [size=128]
        Region 1: I/O ports at 7080 [size=128]
        Region 2: I/O ports at 7400 [size=32]

    00:0f.0 Class 0280: 10b5:9050 (rev 01)
        Subsystem: d84d:4055

Results with this patch:

    Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ sharing enabled
    ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    PCI: Found IRQ 10 for device 0000:00:0f.0
    ttyS4 at I/O 0x7400 (irq = 10) is a 16550A
    ttyS5 at I/O 0x7408 (irq = 10) is a 16550A
    ttyS6 at I/O 0x7410 (irq = 10) is a 16550A
    ttyS7 at I/O 0x7418 (irq = 10) is a 16550A

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-10-24 22:11:57 +01:00
Andrew Morton
8d3b35914a [PATCH] inotify/idr leak fix
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>

IDR trees include a cache of idr_layer objects.  There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.

Add and use idr_destroy() for this.  v9fs and infiniband also need to use
idr_destroy() to avoid leaks.

Or, we make the cache global, like radix_tree_preload().  Which is probably
better.  Later.

Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-23 16:38:39 -07:00
Hugh Dickins
ac9b9c667c [PATCH] Fix handling spurious page fault for hugetlb region
This reverts commit 3359b54c8c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-20 09:02:07 -07:00
Yasunori Goto
281dd25cdc [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.

We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.

The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator.  But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages().  We may be done that post 2.6.14 as a
cleanup.

With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:11:33 -07:00
Seth, Rohit
3359b54c8c [PATCH] Handle spurious page fault for hugetlb region
The hugetlb pages are currently pre-faulted.  At the time of mmap of
hugepages, we populate the new PTEs.  It is possible that HW has already
cached some of the unused PTEs internally.  These stale entries never
get a chance to be purged in existing control flow.

This patch extends the check in page fault code for hugepages.  Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).

Signed-off-by: Rohit Seth <rohit.seth@intel.com>

[ This is apparently arguably an ia64 port bug. But the code won't
  hurt, and for now it fixes a real problem on some ia64 machines ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 13:56:27 -07:00
Zach Brown
4faa528528 [PATCH] aio: revert lock_kiocb()
lock_kiocb() was introduced to serialize retrying and cancellation.  In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock.  Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb.  Cancel has other problems and
has no significant in-tree users that have been complaining about it.  So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 17:03:57 -07:00
Eric Dumazet
5ee832dbc6 [PATCH] rcu: keep rcu callback event counter
This makes call_rcu() keep track of how many events there are on the RCU
list, and cause a reschedule event when the list gets too long.

This helps keep RCU event lists down.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 15:27:58 -07:00
Herbert Xu
b24d18aa74 [PATCH] list: add missing rcu_dereference on first element
It seems that all the list_*_rcu primitives are missing a memory barrier
on the very first dereference.  For example,

#define list_for_each_rcu(pos, head) \
	for (pos = (head)->next; prefetch(pos->next), pos != (head); \
		pos = rcu_dereference(pos->next))

It will go something like:

	pos = (head)->next

	prefetch(pos->next)

	pos != (head)

	do stuff

We're missing a barrier here.

	pos = rcu_dereference(pos->next)

		fetch pos->next

		barrier given by rcu_dereference(pos->next)

		store pos

Without the missing barrier, the pos->next value may turn out to be stale.
In fact, if "do stuff" were also dereferencing pos and relying on
list_for_each_rcu to provide the barrier then it may also break.

So here is a patch to make sure that we have a barrier for the first
element in the list.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 08:59:10 -07:00
Al Viro
688ce17b85 [PATCH]: highest_possible_processor_id() has to be a macro
... otherwise, things like alpha and sparc64 break and break
badly.  They define cpu_possible_map to something else in smp.h
*AFTER* having included cpumask.h.

	If that puppy is a macro, expansion will happen at the actual
caller, when we'd already seen #define cpu_possible_map ... and we will
get the right thing used.

	As an inline helper it will be tokenized before we get to that
define and that's it; no matter what we define later, it won't affect
anything.  We get modules with dependency on cpu_possible_map instead
of the right symbol (phys_cpu_present_map in case of sparc64), or outright
link errors if they are built-in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-16 00:17:33 -07:00
Tim Schmielau
e26148d934 [PATCH] Fix copy-and-paste error in BSD accounting
Fix copy and paste error in jiffies_to_AHZ conversion which leads to wrong
BSD accounting information on alpha and ia64 when
CONFIG_BSD_PROCESS_ACCT_V3 is turned on.

Also update comment to match reorganised header files.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-14 17:10:12 -07:00
David S. Miller
c8923c6b85 [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.
Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sébastien Bernard.

EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly.  That is not necessarily true.

This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:41:23 -07:00
Ben Dooks
afb997c616 [NETPOLL]: wrong return for null netpoll_poll_lock()
When netpoll is not being used, the macro that
defines the removed routing netpoll_poll_lock
defines the return as zero, but the real
routine returns a `void *`

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12 15:12:21 -07:00
Pablo Neira Ayuso
3392315375 [NETFILTER] ctnetlink: allow userspace to change TCP state
This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:23:28 -07:00
Harald Welte
a051a8f730 [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT
Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow.  With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:21:10 -07:00
Pablo Neira Ayuso
e1c73b78e3 [NETFILTER] ctnetlink: add one nesting level for TCP state
To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:

CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE

instead of:

CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:55:49 -07:00
Harald Welte
5bbc243aaf [NETFILTER]: Add missing include to ip_conntrack_tuple.h
Without this #include, __be16 is not defined and userspace programs
will break.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:54:01 -07:00
Harald Welte
b3a91d037a [NETFILTER] nat: remove bogus structure member
When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of
struct ipt_nat_info was obsoleted, but the pointer not removed from the
struct.

This patch removes the pointer, thereby yet again shrinking struct
ip_conntrack.

Discovered-by: Rusty Russell <rusty@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:36 -07:00
Harald Welte
ebe0bbf06c [NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLV
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e.
host byte order tags, net byte order values) are useless, unless a parser
can determine whether an attribute is nested or not.

This patch steals the highest bit of nfattr.nfa_type to indicate whether
the data payload contains a nested nfattr (1) or not (0).

This will break userspace compatibility, but luckily no kernel with
nfnetlink was released so far.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:19 -07:00