Iulia Manda
44dba3d5d6
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
...
This patch simplifies task_struct by removing the four numa_* pointers
in the same array and replacing them with the array pointer. By doing this,
on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
x86_64).
A new parameter is added to the task_faults_idx function so that it can return
an index to the correct offset, corresponding with the old precalculated
pointers.
All of the code in sched/ that depended on task_faults_idx and numa_* was
changed in order to match the new logic.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: mgorman@suse.de
Cc: dave@stgolabs.net
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:57 +01:00
Wanpeng Li
cad3bb32e1
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
...
There are both UP and SMP version of pull_dl_task(), so don't need
to check CONFIG_SMP in switched_from_dl();
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Juri Lelli <juri.lelli@arm.com >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:56 +01:00
Wanpeng Li
cd66091162
sched/deadline: Reschedule from switched_from_dl() after a successful pull
...
In switched_from_dl() we have to issue a resched if we successfully
pulled some task from other cpus. This patch also aligns the behavior
with -rt.
Suggested-by: Juri Lelli <juri.lelli@arm.com >
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:55 +01:00
Wanpeng Li
6b0a563f3a
sched/deadline: Push task away if the deadline is equal to curr during wakeup
...
This patch pushes task away if the dealine of the task is equal
to current during wake up. The same behavior as rt class.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Juri Lelli <juri.lelli@arm.com >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:55 +01:00
Wanpeng Li
acb32132ec
sched/deadline: Add deadline rq status print
...
This patch add deadline rq status print.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Juri Lelli <juri.lelli@arm.com >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:54 +01:00
Wanpeng Li
804968809c
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
...
The yield semantic of deadline class is to reduce remaining runtime to
zero, and then update_curr_dl() will stop it. However, comsumed bandwidth
is reduced from the budget of yield task again even if it has already been
set to zero which leads to artificial overrun. This patch fix it by make
sure we don't steal some more time from the task that yielded in update_curr_dl().
Suggested-by: Juri Lelli <juri.lelli@arm.com >
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:53 +01:00
Wanpeng Li
308a623a40
sched/rt: Clean up check_preempt_equal_prio()
...
This patch checks if current can be pushed/pulled somewhere else
in advance to make logic clear, the same behavior as dl class.
- If current can't be migrated, useless to reschedule, let's hope
task can move out.
- If task is migratable, so let's not schedule it and see if it
can be pushed or pulled somewhere else.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Juri Lelli <juri.lelli@arm.com >
Cc: Kirill Tkhai <ktkhai@parallels.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:52 +01:00
Juri Lelli
75e23e49db
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
...
As per commit f10e00f4bf ("sched/dl: Use dl_bw_of() under
rcu_read_lock_sched()"), dl_bw_of() has to be protected by
rcu_read_lock_sched().
Signed-off-by: Juri Lelli <juri.lelli@arm.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:52 +01:00
Yao Dongdong
9f96742a13
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
Idle cpu is idler than non-idle cpu, so we needn't search for least_loaded_cpu
after we have found an idle cpu.
Signed-off-by: Yao Dongdong <yaodongdong@huawei.com >
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414469286-6023-1-git-send-email-yaodongdong@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:51 +01:00
Kirill Tkhai
67dfa1b756
sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()
...
Currently used hrtimer_try_to_cancel() is racy:
raw_spin_lock(&rq->lock)
... dl_task_timer raw_spin_lock(&rq->lock)
... raw_spin_lock(&rq->lock) ...
switched_from_dl() ... ...
hrtimer_try_to_cancel() ... ...
switched_to_fair() ... ...
... ... ...
... ... ...
raw_spin_unlock(&rq->lock) ... (asquired)
... ... ...
... ... ...
do_exit() ... ...
schedule() ... ...
raw_spin_lock(&rq->lock) ... raw_spin_unlock(&rq->lock)
... ... ...
raw_spin_unlock(&rq->lock) ... raw_spin_lock(&rq->lock)
... ... (asquired)
put_task_struct() ... ...
free_task_struct() ... ...
... ... raw_spin_unlock(&rq->lock)
... (asquired) ...
... ... ...
... (use after free) ...
So, let's implement 100% guaranteed way to cancel the timer and let's
be sure we are safe even in very unlikely situations.
rq unlocking does not limit the area of switched_from_dl() use, because
this has already been possible in pull_dl_task() below.
Let's consider the safety of of this unlocking. New code in the patch
is working when hrtimer_try_to_cancel() fails. This means the callback
is running. In this case hrtimer_cancel() is just waiting till the
callback is finished. Two
1) Since we are in switched_from_dl(), new class is not dl_sched_class and
new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
right after !dl_task() check. After that hrtimer_cancel() returns back too.
The above is:
raw_spin_lock(rq->lock); ...
... dl_task_timer()
... raw_spin_lock(rq->lock);
switched_from_dl() ...
hrtimer_try_to_cancel() ...
raw_spin_unlock(rq->lock); ...
hrtimer_cancel() ...
... raw_spin_unlock(rq->lock);
... return HRTIMER_NORESTART;
... ...
raw_spin_lock(rq->lock); ...
2) But the below is also possible:
dl_task_timer()
raw_spin_lock(rq->lock);
...
raw_spin_unlock(rq->lock);
raw_spin_lock(rq->lock); ...
switched_from_dl() ...
hrtimer_try_to_cancel() ...
... return HRTIMER_NORESTART;
raw_spin_unlock(rq->lock); ...
hrtimer_cancel(); ...
raw_spin_lock(rq->lock); ...
In this case hrtimer_cancel() returns immediately. Very unlikely case,
just to mention.
Nobody can manipulate the task, because check_class_changed() is
always called with pi_lock locked. Nobody can force the task to
participate in (concurrent) priority inheritance schemes (the same reason).
All concurrent task operations require pi_lock, which is held by us.
No deadlocks with dl_task_timer() are possible, because it returns
right after !dl_task() check (it does nothing).
If we receive a new dl_task during the time of unlocked rq, we just
don't have to do pull_dl_task() in switched_from_dl() further.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com >
[ Added comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: Juri Lelli <juri.lelli@arm.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:50 +01:00
Peter Zijlstra
e7097e8bd0
sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
...
In some cases this can trigger a true flood of output.
Requested-by: Ingo Molnar <mingo@kernel.org >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:49 +01:00
Peter Zijlstra
ff960a7317
netdev, sched/wait: Fix sleeping inside wait event
...
rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.
Fix this using the new wait_woken() bits.
Reported-by: Fengguang Wu <fengguang.wu@intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: David S. Miller <davem@davemloft.net >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Cong Wang <cwang@twopensource.com >
Cc: David Gibson <david@gibson.dropbear.id.au >
Cc: Eric Biederman <ebiederm@xmission.com >
Cc: Eric Dumazet <edumazet@google.com >
Cc: Jamal Hadi Salim <jhs@mojatatu.com >
Cc: Jerry Chu <hkchu@google.com >
Cc: Jiri Pirko <jiri@resnulli.us >
Cc: John Fastabend <john.fastabend@gmail.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com >
Cc: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com >
Cc: stephen hemminger <stephen@networkplumber.org >
Cc: Tom Gundersen <teg@jklm.no >
Cc: Tom Herbert <therbert@google.com >
Cc: Veaceslav Falico <vfalico@gmail.com >
Cc: Vlad Yasevich <vyasevic@redhat.com >
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:48 +01:00
Peter Zijlstra
eedf7e47da
rfcomm, sched/wait: Fix broken wait construct
...
rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.
Fix this using the new wait_woken() facility.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Peter Hurley <peter@hurleysoftware.com >
Cc: Alexander Holler <holler@ahsoftware.de >
Cc: David S. Miller <davem@davemloft.net >
Cc: Gustavo Padovan <gustavo@padovan.org >
Cc: Joe Perches <joe@perches.com >
Cc: Johan Hedberg <johan.hedberg@gmail.com >
Cc: Libor Pechacek <lpechacek@suse.cz >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Marcel Holtmann <marcel@holtmann.org >
Cc: Seung-Woo Kim <sw0312.kim@samsung.com >
Cc: Vignesh Raman <Vignesh_Raman@mentor.com >
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:47 +01:00
Peter Zijlstra
6b55fc63f4
audit, sched/wait: Fixup kauditd_thread() wait loop
...
The kauditd_thread wait loop is a bit iffy; it has a number of problems:
- calls try_to_freeze() before schedule(); you typically want the
thread to re-evaluate the sleep condition when unfreezing, also
freeze_task() issues a wakeup.
- it unconditionally does the {add,remove}_wait_queue(), even when the
sleep condition is false.
Use wait_event_freezable() that does the right thing.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Eric Paris <eparis@redhat.com >
Cc: oleg@redhat.com
Cc: Eric Paris <eparis@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:47 +01:00
Peter Zijlstra (Intel)
5d4d565824
sched/wait: Remove wait_event_freezekillable()
...
There is no user.. make it go away.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net >
Cc: Len Brown <len.brown@intel.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Pavel Machek <pavel@ucw.cz >
Cc: linux-pm@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:46 +01:00
Peter Zijlstra
36df04bc52
sched/wait: Reimplement wait_event_freezable()
...
Provide better implementations of wait_event_freezable() APIs.
The problem is with freezer_do_not_count(), it hides the thread from
the freezer, even though this thread might not actually freeze/sleep
at all.
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Len Brown <len.brown@intel.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Pavel Machek <pavel@ucw.cz >
Cc: Rafael J. Wysocki <rjw@rjwysocki.net >
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-d86fz1jmso9wjxa8jfpinp8o@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:45 +01:00
Peter Zijlstra
cb6538e740
sched/wait: Fix a kthread race with wait_woken()
...
There is a race between kthread_stop() and the new wait_woken() that
can result in a lack of progress.
CPU 0 | CPU 1
|
rfcomm_run() | kthread_stop()
... |
if (!test_bit(KTHREAD_SHOULD_STOP)) |
| set_bit(KTHREAD_SHOULD_STOP)
| wake_up_process()
wait_woken() | wait_for_completion()
set_current_state(INTERRUPTIBLE) |
if (!WQ_FLAG_WOKEN) |
schedule_timeout() |
|
After which both tasks will wait.. forever.
Fix this by having wait_woken() check for kthread_should_stop() but
only for kthreads (obviously).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Peter Hurley <peter@hurleysoftware.com >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-11-04 07:17:44 +01:00
Peter Zijlstra
3427445afd
sched: Exclude cond_resched() from nested sleep test
...
cond_resched() is a preemption point, not strictly a blocking
primitive, so exclude it from the ->state test.
In particular, preemption preserves task_struct::state.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Alex Elder <alex.elder@linaro.org >
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Axel Lin <axel.lin@ingics.com >
Cc: Daniel Borkmann <dborkman@redhat.com >
Cc: Dave Jones <davej@redhat.com >
Cc: Jason Baron <jbaron@akamai.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Rusty Russell <rusty@rustcorp.com.au >
Cc: Steven Rostedt <rostedt@goodmis.org >
Link: http://lkml.kernel.org/r/20140924082242.656559952@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:57 +01:00
Peter Zijlstra
8eb23b9f35
sched: Debug nested sleeps
...
Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.
Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.
Another observed problem is calling a blocking function from
schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:52 +01:00
Peter Zijlstra
26cabd3125
sched, net: Clean up sk_wait_event() vs. might_sleep()
...
WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
[<ffffffff8105bc38>] __might_sleep+0x58/0x90
[<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
[<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
Which is a false positive because sk_wait_event() will already have
TASK_RUNNING at that point if it would've gone through
schedule_timeout().
So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.
Reported-by: Ilya Dryomov <ilya.dryomov@inktank.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
Cc: David S. Miller <davem@davemloft.net >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: netdev@vger.kernel.org
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:37 +01:00
Peter Zijlstra
3c9b2c3d64
sched, modules: Fix nested sleep in add_unformed_module()
...
This is a genuine bug in add_unformed_module(), we cannot use blocking
primitives inside a wait loop.
So rewrite the wait_event_interruptible() usage to use the fresh
wait_woken() stuff.
Reported-by: Fengguang Wu <fengguang.wu@intel.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Rusty Russell <rusty@rustcorp.com.au >
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Link: http://lkml.kernel.org/r/20140924082242.458562904@infradead.org
[ So this is probably complex to backport and the race wasn't reported AFAIK,
so not marked for -stable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:30 +01:00
Peter Zijlstra
7d4d26966e
sched, smp: Correctly deal with nested sleeps
...
smp_hotplug_thread::{setup,unpark} functions can sleep too, so be
consistent and do the same for all callbacks.
__might_sleep+0x74/0x80
kmem_cache_alloc_trace+0x4e/0x1c0
perf_event_alloc+0x55/0x450
perf_event_create_kernel_counter+0x2f/0x100
watchdog_nmi_enable+0x8d/0x160
watchdog_enable+0x45/0x90
smpboot_thread_fn+0xec/0x2b0
kthread+0xe4/0x100
ret_from_fork+0x7c/0xb0
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Link: http://lkml.kernel.org/r/20140924082242.392279328@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:24 +01:00
Peter Zijlstra
97d9e28d1a
sched, tty: Deal with nested sleeps
...
n_tty_{read,write} are wait loops with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state.
Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Jiri Slaby <jslaby@suse.cz >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20140924082242.323011233@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:56:10 +01:00
Peter Zijlstra
e23738a730
sched, inotify: Deal with nested sleeps
...
inotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).
Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Al Viro <viro@zeniv.linux.org.uk >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Robert Love <rlove@rlove.org >
Cc: Eric Paris <eparis@parisplace.org >
Cc: John McCutchan <john@johnmccutchan.com >
Cc: Robert Love <rlove@rlove.org >
Cc: Oleg Nesterov <oleg@redhat.com >
Link: http://lkml.kernel.org/r/20140924082242.254858080@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:55:37 +01:00
Peter Zijlstra
1029a2b52c
sched, exit: Deal with nested sleeps
...
do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end
up calling potential sleeps before we reset it.
Not strictly a bug since we're guaranteed to exit the loop and not
call schedule(); put in annotations to quiet might_sleep().
WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270
Call Trace:
[<ffffffff81694991>] dump_stack+0x4e/0x7a
[<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff810bca6e>] __might_sleep+0x7e/0x90
[<ffffffff811a1c15>] might_fault+0x55/0xb0
[<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10
[<ffffffff8109a804>] do_wait+0x104/0x270
[<ffffffff8109b837>] SyS_wait4+0x77/0x100
[<ffffffff8169d692>] system_call_fastpath+0x16/0x1b
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: tglx@linutronix.de
Cc: umgwanakikbuti@gmail.com
Cc: ilya.dryomov@inktank.com
Cc: Alex Elder <alex.elder@linaro.org >
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Axel Lin <axel.lin@ingics.com >
Cc: Daniel Borkmann <dborkman@redhat.com >
Cc: Dave Jones <davej@redhat.com >
Cc: Guillaume Morin <guillaume@morinfr.org >
Cc: Ionut Alexa <ionut.m.alexa@gmail.com >
Cc: Jason Baron <jbaron@akamai.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Michal Hocko <mhocko@suse.cz >
Cc: Michal Schmidt <mschmidt@redhat.com >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com >
Cc: Rik van Riel <riel@redhat.com >
Cc: Rusty Russell <rusty@rustcorp.com.au >
Cc: Steven Rostedt <rostedt@goodmis.org >
Link: http://lkml.kernel.org/r/20140924082242.186408915@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2014-10-28 10:55:30 +01:00