Commit Graph

16050 Commits

Author SHA1 Message Date
Michael Scott
8dbaea2b3d PM / QoS: remove duplicate call to pm_qos_update_target
In 3.10.y backport patch 1dba303727,
the logic to call pm_qos_update_target was moved to __pm_qos_update_request.
However, the original code was left in function pm_qos_update_request.

Currently, if pm_qos_update_request is called where new_value !=
req->node.prio then pm_qos_update_target will be called twice in a row.
Once in pm_qos_update_request and then again in the following call to
_pm_qos_update_request.

Removing the left over code from pm_qos_update_request stops this second
call to pm_qos_update_target where the work of removing / re-adding the
new_value in the constraints list would be duplicated.

Signed-off-by: Michael Scott <michael.scott@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-18 13:22:28 +01:00
John Stultz
72e2d60926 ntp: Fixup adjtimex freq validation on 32-bit systems
commit 29183a70b0 upstream.

Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367 (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: George Joseph <george.joseph@fairview5.com>
Tested-by: George Joseph <george.joseph@fairview5.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:52 -08:00
Jay Lan
ab92b84e8d kdb: fix incorrect counts in KDB summary command output
commit 1467559232 upstream.

The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.

A define of K(x) as
is defined in the code, but not used.

This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.

Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:52 -08:00
Vikram Mulukutla
0d998434ce tracing: Fix unmapping loop in tracing_mark_write
commit 7215853e98 upstream.

Commit 6edb2a8a38 introduced
an array map_pages that contains the addresses returned by
kmap_atomic. However, when unmapping those pages, map_pages[0]
is unmapped before map_pages[1], breaking the nesting requirement
as specified in the documentation for kmap_atomic/kunmap_atomic.

This was caught by the highmem debug code present in kunmap_atomic.
Fix the loop to do the unmapping properly.

Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.org

Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Lime Yang <limey@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:50 -08:00
Lai Jiangshan
677616e3ec smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
commit 4bee96860a upstream.

The following race exists in the smpboot percpu threads management:

CPU0	      	   	     CPU1
cpu_up(2)
  get_online_cpus();
  smpboot_create_threads(2);
			     smpboot_register_percpu_thread();
			     for_each_online_cpu();
			       __smpboot_create_thread();
  __cpu_up(2);

This results in a missing per cpu thread for the newly onlined cpu2 and
in a NULL pointer dereference on a consecutive offline of that cpu.

Proctect smpboot_register_percpu_thread() with get_online_cpus() to
prevent that.

[ tglx: Massaged changelog and removed the change in
        smpboot_unregister_percpu_thread() because that's an
        optimization and therefor not stable material. ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11 14:48:17 +08:00
Tejun Heo
85be16bad7 workqueue: fix subtle pool management issue which can stall whole worker_pool
commit 29187a9eea upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:40 -08:00
Sasha Levin
2497402c9a time: adjtimex: Validate the ADJ_FREQUENCY values
commit 5e5aeb4367 upstream.

Verify that the frequency value from userspace is valid and makes sense.

Unverified values can cause overflows later on.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29 17:40:56 -08:00
Sasha Levin
7336dcc213 time: settimeofday: Validate the values of tv from user
commit 6ada1fc0e1 upstream.

An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29 17:40:56 -08:00
Jiri Olsa
d525563b50 perf: Fix events installation during moving group
commit 9fc81d8742 upstream.

We allow PMU driver to change the cpu on which the event
should be installed to. This happened in patch:

  e2d37cd213 ("perf: Allow the PMU driver to choose the CPU on which to install events")

This patch also forces all the group members to follow
the currently opened events cpu if the group happened
to be moved.

This and the change of event->cpu in perf_install_in_context()
function introduced in:

  0cda4c0231 ("perf: Introduce perf_pmu_migrate_context()")

forces group members to change their event->cpu,
if the currently-opened-event's PMU changed the cpu
and there is a group move.

Above behaviour causes problem for breakpoint events,
which uses event->cpu to touch cpu specific data for
breakpoints accounting. By changing event->cpu, some
breakpoints slots were wrongly accounted for given
cpu.

Vinces's perf fuzzer hit this issue and caused following
WARN on my setup:

   WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
   Can't find any breakpoint slot
   [...]

This patch changes the group moving code to keep the event's
original cpu.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16 06:59:03 -08:00
Oleg Nesterov
12279351a4 exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
commit 24c037ebf5 upstream.

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:17 -08:00
Eric W. Biederman
afb5285388 userns: Allow setting gid_maps without privilege when setgroups is disabled
commit 66d2f338ee upstream.

Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed.  Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:17 -08:00
Eric W. Biederman
1c587ee50e userns: Add a knob to disable setgroups on a per user namespace basis
commit 9cc46516dd upstream.

- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of "deny" means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls currently do
  not allow checking the permissions at open time.

- Writing to the proc file is restricted to before the gid_map
  for the user namespace is set.

  This ensures that disabling setgroups at a user namespace
  level will never remove the ability to call setgroups
  from a process that already has that ability.

  A process may opt in to the setgroups disable for itself by
  creating, entering and configuring a user namespace or by calling
  setns on an existing user namespace with setgroups disabled.
  Processes without privileges already can not call setgroups so this
  is a noop.  Prodcess with privilege become processes without
  privilege when entering a user namespace and as with any other path
  to dropping privilege they would not have the ability to call
  setgroups.  So this remains within the bounds of what is possible
  without a knob to disable setgroups permanently in a user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
6eaab78729 userns: Rename id_map_mutex to userns_state_mutex
commit f0d62aec93 upstream.

Generalize id_map_mutex so it can be used for more state of a user namespace.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
a1821391e5 userns: Only allow the creator of the userns unprivileged mappings
commit f95d7918bd upstream.

If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
f028f2d732 userns: Check euid no fsuid when establishing an unprivileged uid mapping
commit 80dd00a237 upstream.

setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid.  Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
ba0922adbd userns: Don't allow unprivileged creation of gid mappings
commit be7c6dba23 upstream.

As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
fc9b65e3d7 userns: Don't allow setgroups until a gid mapping has been setablished
commit 273d2c67c3 upstream.

setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
b8a0441b54 userns: Document what the invariant required for safe unprivileged mappings.
commit 0542f17bf2 upstream.

The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman
4accc8c8e2 groups: Consolidate the setgroups permission checks
commit 7ff4d90b4c upstream.

Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged.  Add a common function so that
they may all share the same permission checking code.

This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.

A user namespace security fix will update this new helper, shortly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Andy Lutomirski
0061b518b1 uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
commit 82975bc6a6 upstream.

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06 15:05:46 -08:00
Pawel Moll
858879737b perf: Handle compat ioctl
commit b3f207855f upstream.

When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.

For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.

This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:55 -08:00
Miklos Szeredi
bd501a2eb2 audit: keep inode pinned
commit 799b601451 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a578 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:52 -08:00
Mathias Krause
f9b6264a0f posix-timers: Fix stack info leak in timer_create()
commit 6891c4509c upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa73072 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14 08:48:00 -08:00
Imre Deak
1eaaef74ad PM / Sleep: fix recovery during resuming from hibernation
commit 94fb823fcb upstream.

If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14 08:48:00 -08:00
Michal Hocko
e033782a26 OOM, PM: OOM killed task shouldn't escape PM suspend
commit 5695be142e upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
  check_frozen_processes and invert the condition to make the code more
  readable as per Rafael

Fixes: f660daac47 (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14 08:47:58 -08:00