Commit Graph

67 Commits

Author SHA1 Message Date
Dmitry Adamushko
326587b840 sched: fix goto retry in pick_next_task_rt()
looking at it one more time:

(1) it looks to me that there is no need to call
sched_rt_ratio_exceeded() from pick_next_rt_entity()

- [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are
not within this 'tree-like hierarchy' (or whatever we should call it
:-)

- there is also no need to re-check 'rt_rq->rt_time > ratio' at this
point as 'rt_rq->rt_time' couldn't have been increased since the last
call to update_curr_rt() (which obviously calls
sched_rt_ratio_esceeded())
well, it might be that 'ratio' for this rt_rq has been re-configured
(and the period over which this rt_rq was active has not yet been
finished)... but I don't think we should really take this into
account.

(2) now pick_next_rt_entity() must never return NULL, so let's change
pick_next_task_rt() accordingly.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Peter Zijlstra
5a52dd5009 sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY
Remove the curious logic to set it_sched_expires in the future. It useless
because rt.timeout wouldn't be incremented anyway.

Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
is d-2.

Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
CC: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:32 +01:00
Peter Zijlstra
1020387f5f sched: rt-group: reduce rescheduling
Only reschedule if the new group has a higher prio task.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:32 +01:00
Peter Zijlstra
48d5e25821 sched: rt throttling vs no_hz
We need to teach no_hz about the rt throttling because its tick driven.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:31 +01:00
Mike Galbraith
614ee1f61f sched: pull_rt_task() cleanup
"goto out" is an odd way to spell "skip".

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:30 +01:00
Peter Zijlstra
6f505b1642 sched: rt group scheduling
Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.

The hard time limit is required to keep behaviour deterministic.

The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.

[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:30 +01:00
Peter Zijlstra
fa85ae2418 sched: rt time limit
Very simple time limit on the realtime scheduling classes.
Allow the rq's realtime class to consume sched_rt_ratio of every
sched_rt_period slice. If the class exceeds this quota the fair class
will preempt the realtime class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:29 +01:00
Peter Zijlstra
8f4d37ec07 sched: high-res preemption tick
Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:29 +01:00
Peter Zijlstra
78f2c7db60 sched: SCHED_FIFO/SCHED_RR watchdog timer
Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.

So it measures runtime since the last sleep.

Input and ideas by Thomas Gleixner and Lennart Poettering.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Lennart Poettering <mzxreary@0pointer.de>
CC: Michael Kerrisk <mtk.manpages@googlemail.com>
CC: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:27 +01:00
Peter Zijlstra
fa717060f1 sched: sched_rt_entity
Move the task_struct members specific to rt scheduling together.
A future optimization could be to put sched_entity and sched_rt_entity
into a union.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:27 +01:00
Gregory Haskins
c49443c538 sched: remove some old cpuset logic
We had support for overlapping cpuset based rto logic in early
prototypes that is no longer used, so remove it.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:23 +01:00
Gregory Haskins
cdc8eb984c sched: RT-balance, only adjust overload state when changing
The overload set/clears were originally idempotent when this logic was first
implemented.  But that is no longer true due to the addition of the atomic
counter and this logic was never updated to work properly with that change.
So only adjust the overload state if it is actually changing to avoid
getting out of sync.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:23 +01:00
Steven Rostedt
cb46984504 sched: RT-balance, add new methods to sched_class
Dmitry Adamushko found that the current implementation of the RT
balancing code left out changes to the sched_setscheduler and
rt_mutex_setprio.

This patch addresses this issue by adding methods to the schedule classes
to handle being switched out of (switched_from) and being switched into
(switched_to) a sched_class. Also a method for changing of priorities
is also added (prio_changed).

This patch also removes some duplicate logic between rt_mutex_setprio and
sched_setscheduler.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:22 +01:00
Steven Rostedt
9a897c5a67 sched: RT-balance, replace hooks with pre/post schedule and wakeup methods
To make the main sched.c code more agnostic to the schedule classes.
Instead of having specific hooks in the schedule code for the RT class
balancing. They are replaced with a pre_schedule, post_schedule
and task_wake_up methods. These methods may be used by any of the classes
but currently, only the sched_rt class implements them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:22 +01:00
Ingo Molnar
bdd7c81b49 sched: fix sched_rt.c:join/leave_domain
fix build bug in sched_rt.c:join/leave_domain and make them only
be included on SMP builds.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:18 +01:00
Gregory Haskins
637f50851b sched: only balance our RT tasks within our domain
We move the rt-overload data as the first global to per-domain
reclassification.  This limits the scope of overload related cache-line
bouncing to stay with a specified partition instead of affecting all
cpus in the system.

Finally, we limit the scope of find_lowest_cpu searches to the domain
instead of the entire system.  Note that we would always respect domain
boundaries even without this patch, but we first would scan potentially
all cpus before whittling the list down.  Now we can avoid looking at
RQs that are out of scope, again reducing cache-line hits.

Note: In some cases, task->cpus_allowed will effectively reduce our search
to within our domain.  However, I believe there are cases where the
cpus_allowed mask may be all ones and therefore we err on the side of
caution.  If it can be optimized later, so be it.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:18 +01:00
Ingo Molnar
7f51f29820 sched: clean up schedule_balance_rt()
clean up schedule_balance_rt().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:17 +01:00
Ingo Molnar
80bf3171dc sched: clean up pull_rt_task()
clean up pull_rt_task().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:17 +01:00
Ingo Molnar
00597c3ed7 sched: remove leftover debugging
remove leftover debugging.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:16 +01:00
Ingo Molnar
6e1938d3ad sched: remove rt_overload()
remove rt_overload() - it's an unnecessary indirection.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:16 +01:00
Ingo Molnar
84de427489 sched: clean up kernel/sched_rt.c
clean up whitespace damage and missing comments in kernel/sched_rt.c.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:15 +01:00
Ingo Molnar
deeeccd41b sched: clean up overlong line in kernel/sched_debug.c
clean up overlong line in kernel/sched_debug.c.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:15 +01:00
Ingo Molnar
4df64c0bfb sched: clean up find_lock_lowest_rq()
clean up find_lock_lowest_rq().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:15 +01:00
Ingo Molnar
79064fbf75 sched: clean up pick_next_highest_task_rt()
clean up pick_next_highest_task_rt().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:14 +01:00
Steven Rostedt
610bf05645 sched: RT-balance, optimize cpu search
This patch removes several cpumask operations by keeping track
of the first of the CPUS that is of the lowest priority. When
the search for the lowest priority runqueue is completed, all
the bits up to the first CPU with the lowest priority runqueue
is cleared.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:13 +01:00