Commit Graph

98283 Commits

Author SHA1 Message Date
Bharath Ravi d4abc238c9 sched, delay accounting: fix incorrect delay time when constantly waiting on runqueue
This patch corrects the incorrect value of per process run-queue wait
time reported by delay statistics. The anomaly was due to the following
reason. When a process leaves the CPU and immediately starts waiting for
CPU on the runqueue (which means it remains in the TASK_RUNNABLE state),
the time of re-entry into the run-queue is never recorded. Due to this,
the waiting time on the runqueue from this point of re-entry upto the
next time it hits the CPU is not accounted for. This is solved by
recording the time of re-entry of a process leaving the CPU in the
sched_info_depart() function IF the process will go back to waiting on
the run-queue. This IF condition is verified by checking whether the
process is still in the TASK_RUNNABLE state.

The patch was tested on 2.6.26-rc6 using two simple CPU hog programs.
The values noted prior to the fix did not account for the time spent on
the runqueue waiting. After the fix, the correct values were reported
back to user space.

Signed-off-by: Bharath Ravi <bharathravi1@gmail.com>
Signed-off-by: Madhava K R  <madhavakr@gmail.com>
Cc: dhaval@linux.vnet.ibm.com
Cc: vatsa@in.ibm.com
Cc: balbir@in.ibm.com
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:15:28 +02:00
Ingo Molnar d819c49da6 Merge branch 'linus' into sched/urgent 2008-06-19 09:37:31 +02:00
Max Krasnyansky f18f982abf sched: CPU hotplug events must not destroy scheduler domains created by the cpusets
First issue is not related to the cpusets. We're simply leaking doms_cur.
It's allocated in arch_init_sched_domains() which is called for every
hotplug event. So we just keep reallocation doms_cur without freeing it.
I introduced free_sched_domains() function that cleans things up.

Second issue is that sched domains created by the cpusets are
completely destroyed by the CPU hotplug events. For all CPU hotplug
events scheduler attaches all CPUs to the NULL domain and then puts
them all into the single domain thereby destroying domains created
by the cpusets (partition_sched_domains).
The solution is simple, when cpusets are enabled scheduler should not
create default domain and instead let cpusets do that. Which is
exactly what the patch does.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-19 09:14:51 +02:00
Peter Zijlstra 15a8641ead sched: rt-group: fix RR buglet
In tick_task_rt() we first call update_curr_rt() which can dequeue a runqueue
due to it running out of runtime, and then we try to requeue it, of it also
having exhausted its RR quota. Obviously requeueing something that is no longer
on the runqueue will not have the expected result.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Daniel K. <dk@uw.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 09:06:59 +02:00
Peter Zijlstra ad2a3f13b7 sched: rt-group: heirarchy aware throttle
The bandwidth throttle code dequeues a group when it runs out of quota, and
re-queues it once the period rolls over and the quota gets refreshed.

Sadly it failed to take the hierarchy into consideration. Share more of the
enqueue/dequeue code with regular task opterations.

Also, some operations like sched_setscheduler() can dequeue/enqueue tasks that
are in throttled runqueues, we should not inadvertly re-enqueue empty runqueues
so check for that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Daniel K. <dk@uw.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 09:06:57 +02:00
Peter Zijlstra 7ea56616ba sched: rt-group: fix hierarchy
Don't re-set the entity's runqueue to the wrong rq after we've set it
to the right one.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Daniel K. <dk@uw.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 09:06:56 +02:00
Dario Faggioli 49307fd6f7 sched: NULL pointer dereference while setting sched_rt_period_us
When CONFIG_RT_GROUP_SCHED and CONFIG_CGROUP_SCHED are enabled, with:

 echo 10000 > /proc/sys/kernel/sched_rt_period_us

We get this:

 BUG: unable to handle kernel NULL pointer dereference at 0000008c
 [  947.682233] IP: [<c0216b72>] __rt_schedulable+0x12/0x160
 [  947.683123] *pde = 00000000=20
 [  947.683782] Oops: 0000 [#1]
 [  947.684307] Modules linked in:
 [  947.684308]
 [  947.684308] Pid: 2359, comm: bash Not tainted (2.6.26-rc6 #8)
 [  947.684308] EIP: 0060:[<c0216b72>] EFLAGS: 00000246 CPU: 0
 [  947.684308] EIP is at __rt_schedulable+0x12/0x160
 [  947.684308] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
 [  947.684308] ESI: c0521db4 EDI: 00000001 EBP: c6cc9f00 ESP: c6cc9ed0
 [  947.684308]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
 [  947.684308] Process bash (pid: 2359, tiÆcc8000 taskÇa54f00=20 task.tiÆcc8000)
 [  947.684308] Stack: c0222790 00000000 080f8c08 c0521db4 c6cc9f00 00000001 00000000 00000000
 [  947.684308]        c6cc9f9c 00000000 c0521db4 00000001 c6cc9f28 c0216d40 00000000 00000000
 [  947.684308]        c6cc9f9c 000f4240 000e7ef0 ffffffff c0521db4 c79dfb60 c6cc9f58 c02af2cc
 [  947.684308] Call Trace:
 [  947.684308]  [<c0222790>] ? do_proc_dointvec_conv+0x0/0x50
 [  947.684308]  [<c0216d40>] ? sched_rt_handler+0x80/0x110
 [  947.684308]  [<c02af2cc>] ? proc_sys_call_handler+0x9c/0xb0
 [  947.684308]  [<c02af2fa>] ? proc_sys_write+0x1a/0x20
 [  947.684308]  [<c0273c36>] ? vfs_write+0x96/0x160
 [  947.684308]  [<c02af2e0>] ? proc_sys_write+0x0/0x20
 [  947.684308]  [<c027423d>] ? sys_write+0x3d/0x70
 [  947.684308]  [<c0202ef5>] ? sysenter_past_esp+0x6a/0x91
 [  947.684308]  =======================
 [  947.684308] Code: 24 04 e8 62 b1 0e 00 89 c7 89 f8 8b 5d f4 8b 75
 f8 8b 7d fc 89 ec 5d c3 90 55 89 e5 57 56 53 83 ec 24 89 45 ec 89 55 e4
 89 4d e8 <8b> b8 8c 00 00 00 85 ff 0f 84 c9 00 00 00 8b 57 24 39 55 e8
 8b
 [  947.684308] EIP: [<c0216b72>] __rt_schedulable+0x12/0x160 SS:ESP  0068:c6cc9ed0

We think the following patch solves the issue.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Michael Trimarchi <trimarchimichael@yahoo.it>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 09:06:54 +02:00
Dave Airlie 9bedbcb207 agp: brown paper bag patch - put back two lines that got lost
Commit 62c96b9d09 ("agp/intel: cleanup
some serious whitespace badness") didn't just fix whitespace.  It also
lost two lines.

Noticed by Linus. No more whitespace diffs for me.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-18 22:12:50 -07:00
Linus Torvalds 3506ba7b08 Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp/intel: cleanup some serious whitespace badness
  [AGP] intel_agp: Add support for Intel 4 series chipsets
  [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset
  agp: more boolean conversions.
  drivers/char/agp - use bool
  agp: two-stage page destruction issue
  agp/via: fixup pci ids
2008-06-18 21:52:35 -07:00
Dave Airlie 62c96b9d09 agp/intel: cleanup some serious whitespace badness
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 14:27:53 +10:00
Zhenyu Wang 25ce77abf8 [AGP] intel_agp: Add support for Intel 4 series chipsets
Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 14:17:58 +10:00
Zhenyu Wang 598d144823 [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset
This adds missing stolen memory size detect for IGD_GM, be sure to
detect right size as current X intel driver (2.3.2) which has already
worked out.

Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 14:00:37 +10:00
Dave Airlie 9516b030b4 agp: more boolean conversions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 10:42:17 +10:00
Joe Perches c725801292 drivers/char/agp - use bool
Use boolean in AGP instead of having own TRUE/FALSE

--
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 10:04:20 +10:00
Jan Beulich da503fa60b agp: two-stage page destruction issue
besides it apparently being useful only in 2.6.24 (the changes in 2.6.25
really mean that it could be converted back to a single-stage mechanism),
I'm seeing an issue in Xen Dom0 kernels, which is caused by the calling
of gart_to_virt() in the second stage invocations of the destroy function.
I think that besides this being a real issue with Xen (where
unmap_page_from_agp() is not just a page table attribute change), this
also is invalid from a theoretical perspective: One should not assume that
gart_to_virt() is still valid after unmapping a page. So minimally (keeping
the 2-stage mechanism) a patch like the one below would be needed.

Jan

Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 09:56:16 +10:00
Greg KH dcd981a77b agp/via: fixup pci ids
add a new PCI ID and remove an old dodgy one, include the explaination
in the commented code so nobody readds later.

(davej also sent the pci id addition).

Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-06-19 09:52:26 +10:00
Linus Torvalds f9d1c6ca2b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
  RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path
2008-06-18 16:08:59 -07:00
Jack Morgenstein fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Linus Torvalds a8051fde6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  Revert "[WATCHDOG] hpwdt: Fix NMI handling."
  [WATCHDOG] hpwdt: Add CFLAGS to get driver working
  Revert "[WATCHDOG] make watchdog/hpwdt.c:asminline_call() static"
2008-06-18 12:55:00 -07:00
Linus Torvalds 5dfd06215b Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] dpt_i2o: Add PROC_IA64 define
  [SCSI] scsi_host regression: fix scsi host leak
  [SCSI] sr: fix corrupt CD data after media change and delay
2008-06-18 11:55:43 -07:00
Linus Torvalds f32c23f59a Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Clear sub-page HPTE present bits when demoting page size
  [POWERPC] 4xx: Clear new TLB cache attribute bits in Data Storage vector
2008-06-18 11:55:19 -07:00
Linus Torvalds e899536470 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: restore UDFFS_DEBUG to being undefined by default
2008-06-18 11:55:03 -07:00
Linus Torvalds d83b14c0db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
  netlink: genl: fix circular locking
  Revert "mac80211: Use skb_header_cloned() on TX path."
  af_unix: fix 'poll for write'/ connected DGRAM sockets
  tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set
  atl1: relax eeprom mac address error check
  net/enc28j60: low power mode
  net/enc28j60: section fix
  sky2: 88E8040T pci device id
  netxen: download firmware in pci probe
  netxen: cleanup debug messages
  netxen: remove global physical_port array
  netxen: fix portnum for hp mezz cards
  ibm_newemac: select CRC32 in Kconfig
  xfrm: fix fragmentation for ipv4 xfrm tunnel
  netfilter: nf_conntrack_h323: fix module unload crash
  netfilter: nf_conntrack_h323: fix memory leak in module initialization error path
  netfilter: nf_nat: fix RCU races
  atm: [he] send idle cells instead of unassigned when in SDH mode
  atm: [he] limit queries to the device's register space
  atm: [br2864] fix routed vcmux support
  ...
2008-06-18 11:48:40 -07:00
Wim Van Sebroeck fdf7be6f13 Revert "[WATCHDOG] hpwdt: Fix NMI handling."
The old setup works better.

Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2008-06-18 16:22:48 +00:00
Paul Mackerras 65ba6cdc83 [POWERPC] Clear sub-page HPTE present bits when demoting page size
When we demote a slice from 64k to 4k, and we are about to insert an
HPTE for a 4k subpage and we notice that there is an existing 64k
HPTE, we first invalidate that HPTE before inserting the new 4k
subpage HPTE.  Since the bits that encode which hash bucket the old
HPTE was in overlap with the bits that encode which of the 16 subpages
have HPTEs, we need to clear out the subpage HPTE-present bits before
starting to insert HPTEs for the 4k subpages.  If we don't do that, we
can erroneously think that a subpage already has an HPTE when it
doesn't.

That in itself wouldn't be such a problem except that when we go to
update the HPTE that we think is present on machines with a
hypervisor, the hypervisor can tell us that the HPTE we think is there
is actually there even though it isn't, which can lead to a process
getting stuck in a loop, continually faulting.  The reason for the
confusion is that the AVPN (abbreviated virtual page number) we are
looking for in the HPTE for a 4k subpage can actually match the AVPN
in a stale HPTE for another 64k page.  For example, the HPTE for
the 4k subpage at 0x84000f000 will be in the same hash bucket and have
the same AVPN as the HPTE for the 64k page at 0x8400f0000.

This fixes the code to clear out the subpage HPTE-present bits.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-18 21:40:43 +10:00