syslets (or other threads/processes that want io context sharing) can
set this to enforce sharing of io context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
revert 19ef930927.
Kevin Winchester reported a lockup during X startup an bisected
it to this commit.
Reported-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
[SCSI] usbstorage: use last_sector_bug flag universally
[SCSI] libsas: abstract STP task status into a function
[SCSI] ultrastor: clean up inline asm warnings
[SCSI] aic7xxx: fix firmware build
[SCSI] aacraid: fib context lock for management ioctls
[SCSI] ch: remove forward declarations
[SCSI] ch: fix device minor number management bug
[SCSI] ch: handle class_device_create failure properly
[SCSI] NCR5380: fix section mismatch
[SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
[SCSI] IB/iSER: add logical unit reset support
[SCSI] don't use __GFP_DMA for sense buffers if not required
[SCSI] use dynamically allocated sense buffer
[SCSI] scsi.h: add macro for enclosure bit of inquiry data
[SCSI] sd: add fix for devices with last sector access problems
[SCSI] fix pcmcia compile problem
[SCSI] aacraid: add Voodoo Lite class of cards.
[SCSI] aacraid: add new driver features flags
[SCSI] qla2xxx: Update version number to 8.02.00-k7.
[SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
...
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
print_cfs_stats is callable from interrupt context (sysrq), hence it should
not take mutexes. Change it to use RCU since the task group data is RCU
freed anyway.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The attached patch is something really simple that can sometimes help
in getting more info out of a hung system.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
printk timestamps: use ktime_get().
Some platforms have a functioning clocksource function only after
they are done with early bootup, so delay this until out of
SYSTEM_BOOTING state.
it's also inherently safe now, as any bugs in this area will be
caught by the printk recursion checks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
looking at it one more time:
(1) it looks to me that there is no need to call
sched_rt_ratio_exceeded() from pick_next_rt_entity()
- [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are
not within this 'tree-like hierarchy' (or whatever we should call it
:-)
- there is also no need to re-check 'rt_rq->rt_time > ratio' at this
point as 'rt_rq->rt_time' couldn't have been increased since the last
call to update_curr_rt() (which obviously calls
sched_rt_ratio_esceeded())
well, it might be that 'ratio' for this rt_rq has been re-configured
(and the period over which this rt_rq was active has not yet been
finished)... but I don't think we should really take this into
account.
(2) now pick_next_rt_entity() must never return NULL, so let's change
pick_next_task_rt() accordingly.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched: fix rq->clock warps on frequency changes
Fix 2bacec8c31
(sched: touch softlockup watchdog after idling) that reintroduced warps
on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock
that checks rq->clock for warps, so call it after adjusting rq->clock.
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ensure that the kernel threads are created with the usual nice level
and affinity even if kthreadd's properties were changed from the
default by root.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Based on a suggestion from Andi:
In various cases, the unload of a module may leave some bad state around
that causes a kernel crash AFTER a module is unloaded; and it's then hard
to find which module caused that.
This patch tracks the last unloaded module, and prints this as part of the
module list in the oops trace.
Right now, only the last 1 module is tracked; I expect that this is enough
for the vast majority of cases where this information matters; if it turns
out that tracking more is important, we can always extend it to that.
[ mingo@elte.hu: build fix ]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's rather common that an oops/WARN_ON/BUG happens during the load or
unload of a module. Unfortunatly, it's not always easy to see directly
which module is being loaded/unloaded from the oops itself. Worse,
it's not even always possible to ask the bug reporter, since there
are so many components (udev etc) that auto-load modules that there's
a good chance that even the reporter doesn't know which module this is.
This patch extends the existing "show if it's tainting" print code,
which is used as part of printing the modules in the oops/BUG/WARN_ON
to include a "+" for "being loaded" and a "-" for "being unloaded".
As a result this extension, the "taint_flags()" function gets renamed to
"module_flags()" (and takes a module struct as argument, not a taint
flags int).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the curious logic to set it_sched_expires in the future. It useless
because rt.timeout wouldn't be incremented anyway.
Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
is d-2.
Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
CC: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only reschedule if the new group has a higher prio task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hrtimer_wakeup creates a
base->lock
rq->lock
lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ
which doesn't hold base->lock.
This fully untangles hrtimer locks from the scheduler locks, and allows
hrtimer usage in the scheduler proper.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently all highres=off timers are run from softirq context, but
HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.
Fix this up by splitting it similar to the highres=on case.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>