Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.
To allow tasks to avoid running on every suspend/resume cycle,
this patch adds additional freezable wrappers around blocking calls
that call freezer_do_not_count(). Combined with the previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed.
Additional patches will convert the most common locations that
userspace blocks in to use freezable helpers.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some of the freezable helpers have to be macros because their
condition argument needs to get evaluated every time through
the wait loop. Convert the others to static inline to make
future changes easier.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in. On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.
Convert the existing wait_event_freezable* wrappers to use
freezer_do_not_count(). Combined with a previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed.
This results in a small change in behavior, previously a race
between freezing and a normal wakeup would be won by the wakeup,
now the task will freeze and then handle the wakeup after thawing.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.
History:
This patch was originally applied as 6aa9707099 and reverted in
dbf520a9d7 because NFS was freezing with locks held. It was
deemed better to keep the bad freeze point in NFS to allow laptops
to suspend consistently. The previous patch in this series converts
NFS to call _unsafe versions of the freezable helpers so that
lockdep doesn't complain about them until a more correct fix
can be applied.
[akpm@linux-foundation.org: export debug_check_no_locks_held]
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CIFS calls wait_event_freezekillable_unsafe with a VFS lock held,
which is unsafe and will cause lockdep warnings when 6aa9707
"lockdep: check that no locks held at freeze time" is reapplied
(it was reverted in dbf520a). CIFS shouldn't be doing this, but
it has long-running syscalls that must hold a lock but also
shouldn't block suspend. Until CIFS freeze handling is rewritten
to use a signal to exit out of the critical section, add a new
wait_event_freezekillable_unsafe helper that will not run the
lockdep test when 6aa9707 is reapplied, and call it from CIFS.
In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock. Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
NFS calls the freezable helpers with locks held, which is unsafe
and will cause lockdep warnings when 6aa9707 "lockdep: check
that no locks held at freeze time" is reapplied (it was reverted
in dbf520a). NFS shouldn't be doing this, but it has
long-running syscalls that must hold a lock but also shouldn't
block suspend. Until NFS freeze handling is rewritten to use a
signal to exit out of the critical section, add new *_unsafe
versions of the helpers that will not run the lockdep test when
6aa9707 is reapplied, and call them from NFS.
In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock. Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.
[akpm@linux-foundation.org: export debug_check_no_locks_held]
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At present, the value of timeout for freezing is 20s, which is
meaningless in case that one thread is frozen with mutex locked
and another thread is trying to lock the mutex, as this time of
freezing will fail unavoidably.
And if there is no new wakeup event registered, the system will
waste at most 20s for such meaningless trying of freezing.
With this patch, the value of timeout can be configured to smaller
value, so such meaningless trying of freezing will be aborted in
earlier time, and later freezing can be also triggered in earlier
time. And more power will be saved.
In normal case on mobile phone, it costs real little time to freeze
processes. On some platform, it only costs about 20ms to freeze
user space processes and 10ms to freeze kernel freezable threads.
Signed-off-by: Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull cgroup changes from Tejun Heo:
"A lot of activities on cgroup side. The big changes are focused on
making cgroup hierarchy handling saner.
- cgroup_rmdir() had peculiar semantics - it allowed cgroup
destruction to be vetoed by individual controllers and tried to
drain refcnt synchronously. The vetoing never worked properly and
caused good deal of contortions in cgroup. memcg was the last
reamining user. Michal Hocko removed the usage and cgroup_rmdir()
path has been simplified significantly. This was done in a
separate branch so that the memcg people can base further memcg
changes on top.
- The above allowed cleaning up cgroup lifecycle management and
implementation of generic cgroup iterators which are used to
improve hierarchy support.
- cgroup_freezer updated to allow migration in and out of a frozen
cgroup and handle hierarchy. If a cgroup is frozen, all descendant
cgroups are frozen.
- netcls_cgroup and netprio_cgroup updated to handle hierarchy
properly.
- Various fixes and cleanups.
- Two merge commits. One to pull in memcg and rmdir cleanups (needed
to build iterators). The other pulled in cgroup/for-3.7-fixes for
device_cgroup fixes so that further device_cgroup patches can be
stacked on top."
Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)
* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
cgroup: update Documentation/cgroups/00-INDEX
cgroup_rm_file: don't delete the uncreated files
cgroup: remove subsystem files when remounting cgroup
cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
cgroup: warn about broken hierarchies only after css_online
cgroup: list_del_init() on removed events
cgroup: fix lockdep warning for event_control
cgroup: move list add after list head initilization
netprio_cgroup: allow nesting and inherit config on cgroup creation
netprio_cgroup: implement netprio[_set]_prio() helpers
netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
netprio_cgroup: reimplement priomap expansion
netprio_cgroup: shorten variable names in extend_netdev_table()
netprio_cgroup: simplify write_priomap()
netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
cgroup: remove obsolete guarantee from cgroup_task_migrate.
cgroup: add cgroup->id
cgroup, cpuset: remove cgroup_subsys->post_clone()
cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
...
If FREEZER is not defined, the error as following will be throw
when compiled.
arch/arm/kernel/signal.c:645: error: implicit declaration of function
'try_to_freeze_nowarn'
Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
to ensure that a task doing STOPPED/TRACED -> RUNNING transition
can't escape freezing. This mostly works, but ptrace_stop() does
not necessarily call schedule(), it can change task->state back to
RUNNING and check freezing() without any lock/barrier in between.
We could add the necessary barrier, but this patch changes
ptrace_stop() and do_signal_stop() to use freezable_schedule().
This fixes the race, freezer_count() and freezer_should_skip()
carefully avoid the race.
And this simplifies the code, try_to_freeze_tasks/update_if_frozen
no longer need to use task_is_stopped_or_traced() checks with the
non trivial assumptions. We can rely on the mechanism which was
specially designed to mark the sleeping task as "frozen enough".
v2: As Tejun pointed out, we can also change get_signal_to_deliver()
and move try_to_freeze() up before 'relock' label.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A task is considered frozen enough between freezer_do_not_count() and
freezer_count() and freezers use freezer_should_skip() to test this
condition. This supposedly works because freezer_count() always calls
try_to_freezer() after clearing %PF_FREEZER_SKIP.
However, there currently is nothing which guarantees that
freezer_count() sees %true freezing() after clearing %PF_FREEZER_SKIP
when freezing is in progress, and vice-versa. A task can escape the
freezing condition in effect by freezer_count() seeing !freezing() and
freezer_should_skip() seeing %PF_FREEZER_SKIP.
This patch adds smp_mb()'s to freezer_count() and
freezer_should_skip() such that either %true freezing() is visible to
freezer_count() or !PF_FREEZER_SKIP is visible to
freezer_should_skip().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
Commit 2aede851dd
PM / Hibernate: Freeze kernel threads after preallocating memory
introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.
Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit 2aede851dd
attempted to avoid into that particular code path. For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.
Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@kernel.org
...it should return the return code from schedule_timeout_killable(),
not the one from freezer_count().
All of the current callers ignore the return code so the bug is
harmless but it's worth fixing.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
At present, the functions freezer_count() and freezer_do_not_count()
impose the restriction that they are effective only for userspace processes.
However, now, these functions have found more utility than originally
intended by the commit which introduced it: ba96a0c8 (freezer:
fix vfork problem). And moreover, even the vfork issue actually does not
need the above restriction in these functions.
So, modify these functions to make them work even for kernel threads, so
that they can be used at other places in the kernel, where the userspace
restriction doesn't apply.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
layer. This should allow suspend and hibernate events to proceed, even
when there are RPC's pending on the wire.
Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
and freezer_count calls. This allows the freezer to skip tasks that are
sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Commit 944e192db53c "freezer: rename thaw_process() to __thaw_task()
and simplify the implementation" did not create a !CONFIG_FREEZER version
of __thaw_task().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
wait_event_freezable() and friends stop the waiting if try_to_freeze()
fails. This is not right, we can race with __thaw_task() and in this
case
- wait_event_freezable() returns the wrong ERESTARTSYS
- wait_event_freezable_timeout() can return the positive
value while condition == F
Change the code to always check __retval/condition before return.
Note: with or without this patch the timeout logic looks strange,
probably we should recalc timeout if try_to_freeze() returns T.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
There's no in-kernel user of set_freezable_with_signal() left. Mixing
TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
kernel threads never travel signal delivery path on their own.
e.g. the current implementation is buggy in the cancelation path of
__thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state. This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.
This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
After "freezer: make freezing() test freeze conditions in effect
instead of TIF_FREEZE", freezing() returns authoritative answer on
whether the current task should freeze or not and freeze_task()
doesn't need or use @sig_only. Remove it.
While at it, rewrite function comment for freeze_task() and rename
@sig_only to @user_only in try_to_freeze_tasks().
This patch doesn't cause any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
A kthread doing set_freezable*() may race with on-going PM freeze and
the freezer might think all tasks are frozen while the new freezable
kthread is merrily proceeding to execute code paths which aren't
supposed to be executing during PM freeze.
Reimplement set_freezable[_with_signal]() using __set_freezable() such
that freezable PF flags are modified under freezer_lock and
try_to_freeze() is called afterwards. This eliminates race condition
against freezing.
Note: Separated out from larger patch to resolve fix order dependency
Oleg pointed out.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
should_send_signal() is only used in freezer.c. Exporting them only
increases chance of abuse. Open code the two users and remove it.
Update frozen() to return bool.
Signed-off-by: Tejun Heo <tj@kernel.org>
Using TIF_FREEZE for freezing worked when there was only single
freezing condition (the PM one); however, now there is also the
cgroup_freezer and single bit flag is getting clumsy.
thaw_processes() is already testing whether cgroup freezing in in
effect to avoid thawing tasks which were frozen by both PM and cgroup
freezers.
This is racy (nothing prevents race against cgroup freezing) and
fragile. A much simpler way is to test actual freeze conditions from
freezing() - ie. directly test whether PM or cgroup freezing is in
effect.
This patch adds variables to indicate whether and what type of
freezing conditions are in effect and reimplements freezing() such
that it directly tests whether any of the two freezing conditions is
active and the task should freeze. On fast path, freezing() is still
very cheap - it only tests system_freezing_cnt.
This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
freeze/thaw operations more usual - updating state variables for the
new state and nudging target tasks so that they notice the new state
and comply. As long as the nudging happens after state update, it's
race-free.
* This allows use of freezing() in freeze_task(). Replace the open
coded tests with freezing().
* p != current test is added to warning printing conditions in
try_to_freeze_tasks() failure path. This is necessary as freezing()
is now true for the task which initiated freezing too.
-v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
system_freezing_cnt. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org> (for the cgroup portions)
TIF_FREEZE will be removed soon and freezing() will directly test
whether any freezing condition is in effect. Make the following
changes in preparation.
* Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
return bool.
* Make cgroup_freezing() access task_freezer() under rcu read lock
instead of task_lock(). This makes the state dereferencing racy
against task moving to another cgroup; however, it was already racy
without this change as ->state dereference wasn't synchronized.
This will be later dealt with using attach hooks.
* freezer->state is now set before trying to push tasks into the
target state.
-v2: Oleg pointed out that freeze_change_state() was setting
freeze->state incorrectly to CGROUP_FROZEN instead of
CGROUP_FREEZING. Fixed.
-v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
try_to_freeze_cgroup() regardless of the current state. Patch
updated such that the actual freeze/thaw operations are always
performed on invocation. This shouldn't make any difference
unless something is broken.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>