linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Files

Yasunori Goto ca465bac8c sched: Fix ancient race in do_exit()

commit b5740f4b2c upstream.

try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.

Here is the sequence how it occurs:

 ----------------------------------+-----------------------------
                                   |
            CPU A                  |             CPU B
 ----------------------------------+-----------------------------

TASK A calls exit()....

do_exit()

  exit_mm()
    down_read(mm->mmap_sem);

    rwsem_down_failed_common()

      set TASK_UNINTERRUPTIBLE
      set waiter.task <= task A
      list_add to sem->wait_list
           :
      raw_spin_unlock_irq()
      (I/O interruption occured)

                                      __rwsem_do_wake(mmap_sem)

                                        list_del(&waiter->list);
                                        waiter->task = NULL
                                        wake_up_process(task A)
                                          try_to_wake_up()
                                             (task is still
                                               TASK_UNINTERRUPTIBLE)
                                              p->on_rq is still 1.)

                                              ttwu_do_wakeup()
                                                 (*A)
                                                   :
     (I/O interruption handler finished)

      if (!waiter.task)
          schedule() is not called
          due to waiter.task is NULL.

      tsk->state = TASK_RUNNING

          :
                                              check_preempt_curr();
                                                  :
  task->state = TASK_DEAD
                                              (*B)
                                        <---    set TASK_RUNNING (*C)

     schedule()
     (exit task is running again)
     BUG_ON() is called!
 --------------------------------------------------------

The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.

HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....

If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.

By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2012-10-02 09:47:55 -07:00

debug

kgdb,debug_core: pass the breakpoint struct instead of address and memory

2012-04-13 08:14:07 -07:00

events

perf_event: Switch to internal refcount, fix race with close()

2012-10-02 09:47:24 -07:00

gcov

gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL

2011-06-15 20:04:01 -07:00

irq

random: remove rand_initialize_irq()

2012-08-15 12:04:28 -07:00

power

ftrace: Disable function tracing during suspend/resume and hibernation, again

2012-08-09 08:27:35 -07:00

time

time: Move ktime_t overflow checking into timespec_valid_strict

2012-10-02 09:47:52 -07:00

trace

tracing: change CPU ring buffer state from tracing_cpumask

2012-07-16 08:47:51 -07:00

.gitignore

…

acct.c

…

async.c

Fix a dead loop in async_synchronize_full()

2012-10-02 09:47:41 -07:00

audit_tree.c

audit: fix refcounting in audit-tree

2012-09-14 10:00:38 -07:00

audit_watch.c

kill path_lookup()

2011-03-14 09:15:23 -04:00

audit.c

netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms

2011-03-03 10:55:40 -08:00

audit.h

audit: make functions static

2010-10-30 01:42:19 -04:00

auditfilter.c

netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms

2011-03-03 10:55:40 -08:00

auditsc.c

audit: acquire creds selectively to reduce atomic op overhead

2011-04-27 15:11:03 +02:00

backtracetest.c

…

bounds.c

memcg: remove direct page_cgroup-to-page pointer

2011-03-23 19:46:28 -07:00

capability.c

Merge branch 'master' into next

2011-05-19 18:51:57 +10:00

cgroup_freezer.c

cgroup_freezer: fix freezing groups with stopped tasks

2011-12-09 08:52:27 -08:00

cgroup.c

cgroup: fix to allow mounting a hierarchy by name

2012-01-12 11:35:08 -08:00

compat.c

compat: Fix RT signal mask corruption via sigprocmask

2012-05-21 09:40:04 -07:00

configs.c

…

cpu.c

PM / Sleep: Fix race between CPU hotplug and freezer

2012-01-12 11:35:46 -08:00

cpuset.c

cpuset: mm: reduce large amounts of memory barrier related damage v3

2012-08-01 12:27:20 -07:00

crash_dump.c

crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn

2011-03-23 19:47:19 -07:00

cred.c

cred: copy_process() should clear child->replacement_session_keyring

2012-04-13 08:14:08 -07:00

delayacct.c

…

dma.c

…

elfcore.c

…

exec_domain.c

…

exit.c

sched: Fix ancient race in do_exit()

2012-10-02 09:47:55 -07:00

extable.c

extable, core_kernel_data(): Make sure all archs define _sdata

2011-05-20 08:56:56 +02:00

fork.c

cpuset: mm: reduce large amounts of memory barrier related damage v3

2012-08-01 12:27:20 -07:00

freezer.c

Freezer: Use SMP barriers

2011-05-17 23:19:17 +02:00

futex_compat.c

futex: Do not leak robust list to unprivileged process

2012-04-22 16:21:45 -07:00

futex.c

futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()

2012-08-09 08:27:54 -07:00

groups.c

userns: user namespaces: convert several capable() calls

2011-03-23 19:47:08 -07:00

hrtimer.c

hrtimer: Update hrtimer base offsets each hrtimer_interrupt

2012-07-19 08:58:46 -07:00

hung_task.c

hung_task: fix false positive during vfork

2012-01-06 14:14:13 -08:00

irq_work.c

irq_work: Use per cpu atomics instead of regular atomics

2010-12-18 15:54:48 +01:00

itimer.c

…

jump_label.c

jump_label: jump_label_inc may return before the code is patched

2011-12-09 08:52:50 -08:00

kallsyms.c

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

2011-03-25 17:52:22 -07:00

Kconfig.freezer

…

Kconfig.hz

…

Kconfig.locks

arch:Kconfig.locks Remove unused config option.

2011-04-10 17:01:05 +02:00

Kconfig.preempt

…

kexec.c

PM: Remove sysdev suspend, resume and shutdown operations

2011-05-11 21:37:15 +02:00

kfifo.c

…

kmod.c

kmod: prevent kmod_loop_msg overflow in __request_module()

2011-11-11 09:35:48 -08:00

kprobes.c

kprobes: adjust "fix a memory leak in function pre_handler_kretprobe()"

2012-03-12 10:32:57 -07:00

ksysfs.c

kernel/ksysfs.c: expose file_caps_enabled in sysfs

2011-04-19 16:45:51 -07:00

kthread.c

cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed

2011-05-28 17:02:57 +02:00

latencytop.c

Fix common misspellings

2011-03-31 11:26:23 -03:00

lockdep_internals.h

…

lockdep_proc.c

lockdep: Remove unused 'factor' variable from lockdep_stats_show()

2011-03-23 13:54:47 +01:00

lockdep_states.h

…

lockdep.c

lockdep: Fix lock_is_held() on recursion

2011-06-07 12:25:50 +02:00

Makefile

cgroup: remove the ns_cgroup

2011-05-26 17:12:34 -07:00

module.c

module: Remove module size limit

2012-04-02 09:27:20 -07:00

mutex-debug.c

mutex: Use p->on_cpu for the adaptive spin

2011-04-14 08:52:33 +02:00

mutex-debug.h

mutex: Use p->on_cpu for the adaptive spin

2011-04-14 08:52:33 +02:00

mutex.c

lockdep, mutex: provide mutex_lock_nest_lock

2011-05-25 08:39:17 -07:00

mutex.h

mutex: Use p->on_cpu for the adaptive spin

2011-04-14 08:52:33 +02:00

notifier.c

…

nsproxy.c

cgroup: remove the ns_cgroup

2011-05-26 17:12:34 -07:00

padata.c

Fix common misspellings

2011-03-31 11:26:23 -03:00

panic.c

lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep

2012-02-13 11:06:10 -08:00

params.c

params.c: Use new strtobool function to process boolean inputs

2011-05-19 16:55:28 +09:30

pid_namespace.c

pidns: call pid_ns_prepare_proc() from create_pid_namespace()

2011-03-23 19:46:58 -07:00

pid.c

next_pidmap: fix overflow condition

2011-04-18 10:35:30 -07:00

pm_qos_params.c

Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6

2011-05-29 11:18:09 -07:00

posix-cpu-timers.c

cputimer: Cure lock inversion

2011-10-25 07:10:14 +02:00

posix-timers.c

posix-timers: RCU conversion

2011-05-24 12:10:51 +02:00

printk.c

cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning

2012-02-03 09:18:57 -08:00

profile.c

kernel/profile.c: remove some duplicate code from profile_hits()

2011-05-26 17:12:37 -07:00

ptrace.c

ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread

2011-05-25 19:20:21 +02:00

range.c

kernel/range.c: fix clean_sort_range() for the case of full array

2010-11-12 07:55:31 -08:00

rcupdate.c

rcu: Use WARN_ON_ONCE for DEBUG_OBJECTS_RCU_HEAD warnings

2011-05-05 23:16:57 -07:00

rcutiny_plugin.h

rcu: Converge TINY_RCU expedited and normal boosting

2011-05-05 23:16:58 -07:00

rcutiny.c

sanitize <linux/prefetch.h> usage

2011-05-20 12:50:29 -07:00

rcutorture.c

rcu: mark rcutorture boosting callback as being on-stack

2011-05-05 23:16:57 -07:00

rcutree_plugin.h

softirq,rcu: Inform RCU of irq_exit() activity

2011-07-20 10:50:12 -07:00

rcutree_trace.c

rcu: use softirq instead of kthreads except when RCU_BOOST=y

2011-06-15 23:07:21 -07:00

rcutree.c

rcu: Prevent RCU callbacks from executing before scheduler initialized

2011-07-13 08:17:56 -07:00

rcutree.h

rcu: Move RCU_BOOST #ifdefs to header file

2011-06-16 16:12:05 -07:00

relay.c

relay: prevent integer overflow in relay_open()

2012-02-20 12:48:10 -08:00

res_counter.c

memcg: res_counter_read_u64(): fix potential races on 32-bit machines

2011-03-23 19:46:22 -07:00

resource.c

resource: ability to resize an allocated resource

2011-07-06 10:54:08 -07:00

rtmutex_common.h

rtmutex: Simplify PI algorithm and make highest prio task get lock

2011-01-27 21:13:51 -05:00

rtmutex-debug.c

rtmutex: Simplify PI algorithm and make highest prio task get lock

2011-01-27 21:13:51 -05:00

rtmutex-debug.h

…

rtmutex-tester.c

rtmutex: tester: Remove the remaining BKL leftovers

2011-02-22 22:07:22 +01:00

rtmutex.c

rtmutex: Simplify PI algorithm and make highest prio task get lock

2011-01-27 21:13:51 -05:00

rtmutex.h

…

rwsem.c

…

sched_autogroup.c

Fix common misspellings

2011-03-31 11:26:23 -03:00

sched_autogroup.h

sched, autogroup: Stop going ahead if autogroup is disabled

2011-02-23 11:33:59 +01:00

sched_clock.c

sched: Add some clock info to sched_debug

2010-11-23 10:29:08 +01:00

sched_cpupri.c

…

sched_cpupri.h

…

sched_debug.c

sched: Get rid of lock_depth

2011-04-24 13:18:38 +02:00

sched_fair.c

sched: Break out cpu_power from the sched_group structure

2011-07-20 18:32:40 +02:00

sched_features.h

sched: Allow for overlapping sched_domain spans

2011-07-20 18:32:41 +02:00

sched_idletask.c

sched: Drop the rq argument to sched_class::select_task_rq()

2011-04-14 08:52:36 +02:00

sched_rt.c

sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW

2012-02-13 11:06:08 -08:00

sched_stats.h

sched: More sched_domain iterations fixes

2011-05-28 17:02:54 +02:00

sched_stoptask.c

sched: Drop the rq argument to sched_class::select_task_rq()

2011-04-14 08:52:36 +02:00

sched.c

sched: Fix race in task_group()

2012-10-02 09:47:42 -07:00

seccomp.c

…

semaphore.c

…

signal.c

ptrace: don't clear GROUP_STOP_SIGMASK on double-stop

2011-11-11 09:36:23 -08:00

smp.c

generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

2011-06-17 10:17:12 +02:00

softirq.c

softirq,rcu: Inform RCU of irq_exit() activity

2011-07-20 10:50:12 -07:00

spinlock.c

…

srcu.c

rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status

2011-01-14 04:56:49 -08:00

stacktrace.c

…

stop_machine.c

x86, mtrr: lock stop machine during MTRR rendezvous sequence

2011-08-29 13:29:08 -07:00

sys_ni.c

ipc: Add missing sys_ni entries for ipc/compat.c functions

2011-05-20 13:53:02 -07:00

sys.c

Avoid using variable-length arrays in kernel/sys.c

2011-10-25 07:10:14 +02:00

sysctl_binary.c

binary_sysctl(): fix memory leak

2012-01-06 14:13:50 -08:00

sysctl_check.c

sysctl_check: drop dead code

2011-03-23 19:46:51 -07:00

sysctl.c

sysctl: fix write access to dmesg_restrict/kptr_restrict

2012-04-13 08:14:07 -07:00

taskstats.c

Make TASKSTATS require root access

2011-12-21 12:57:40 -08:00

test_kprobes.c

…

time.c

time: Change jiffies_to_clock_t() argument type to unsigned long

2011-11-11 09:35:52 -08:00

timeconst.pl

…

timer.c

timers: Consider slack value in mod_timer()

2011-06-03 15:02:32 +02:00

tracepoint.c

jump label: Introduce static_branch() interface

2011-04-04 12:48:08 -04:00

tsacct.c

taskstats: use real microsecond granularity for CPU times

2010-10-27 18:03:17 -07:00

uid16.c

userns: user namespaces: convert several capable() calls

2011-03-23 19:47:08 -07:00

up.c

…

user_namespace.c

user_ns: improve the user_ns on-the-slab packaging

2011-01-13 08:03:18 -08:00

user-return-notifier.c

Fix common misspellings

2011-03-31 11:26:23 -03:00

user.c

userns: add a user_namespace as creator/owner of uts_namespace

2011-03-23 19:46:59 -07:00

utsname_sysctl.c

…

utsname.c

ns proc: Add support for the uts namespace

2011-05-10 14:35:35 -07:00

wait.c

Fix common misspellings

2011-03-31 11:26:23 -03:00

watchdog.c

kernel/watchdog.c: Use proper ANSI C prototypes

2011-05-23 21:07:40 -07:00

workqueue_sched.h

…

workqueue.c

workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic

2012-10-02 09:47:40 -07:00