Commit Graph

113 Commits

Author SHA1 Message Date
Vincent Donnefort
d9f0cedbaf ANDROID: stop_machine: stop_one_cpu_async
This new interface allows to trigger a stopper on a given CPU and wait
for the end of the work in a separated function cpu_stop_work_wait().

This differs from stop_one_cpu_nowait() by allowing the usage of the
cpu_stop completion mechanism.

Bug: 161210528
Change-Id: Ida51371e32897d008ece0639190fc21feabb0f28
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
2020-12-08 19:07:21 +00:00
Greg Kroah-Hartman
acd4caf0df Merge tag 'v5.10-rc2' into android-mainline
Linux 5.10-rc2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib7738b2fe5c513b7eb2dc7b475f4dc848df931d2
2020-11-02 12:03:50 +01:00
Zong Li
4230e2deaa stop_machine, rcu: Mark functions as notrace
Some architectures assume that the stopped CPUs don't make function calls
to traceable functions when they are in the stopped state. See also commit
cb9d7fd51d ("watchdog: Mark watchdog touch functions as notrace").

Violating this assumption causes kernel crashes when switching tracer on
RISC-V.

Mark rcu_momentary_dyntick_idle() and stop_machine_yield() notrace to
prevent this.

Fixes: 4ecf0a43e7 ("processor: get rid of cpu_relax_yield")
Fixes: 366237e7b0 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()")
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Atish Patra <atish.patra@wdc.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201021073839.43935-1-zong.li@sifive.com
2020-10-26 12:12:27 +01:00
Choonghoon Park
a085424328 ANDROID: Export some scheduler APIs for vendor modules
Make some scheduler APIs exports to allow vendor
modules to use them. It is necessary for the modules
to migrate tasks as they want.

  activate_task:
    To make an inactive and migrated task runnable.

  deactivate_task:
    To make an active task migratible.

  check_preempt_curr:
    To check whether a migrated task needs to preempt
    current task and if so, to do it.

  set_task_cpu:
    To set a cpu for a migratible task and force
    the task to be migrated.

  stop_one_cpu_nowait:
    To move a queued task, stopper should be used.

Bug: 155241766

Signed-off-by: Choonghoon Park <choong.park@samsung.com>
Change-Id: Ied940640525101efbbcef6eca0c39f15eb580007
2020-07-27 22:37:02 +00:00
Yangtao Li
35f4cd96f5 stop_machine: Make stop_cpus() static
The function stop_cpus() is only used internally by the
stop_machine for stop multiple cpus.

Make it static.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191228161912.24082-1-tiny.windzz@gmail.com
2020-01-17 10:19:21 +01:00
Yangtao Li
a5e37de90e stop_machine: remove try_stop_cpus helper
try_stop_cpus is not used after this:

commit c190c3b16c ("rcu: Switch synchronize_sched_expedited() to
stop_one_cpu()")

So remove it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191214195107.26480-1-tiny.windzz@gmail.com
2019-12-17 13:32:51 +01:00
Ingo Molnar
43e0ae7ae0 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU and LKMM changes from Paul E. McKenney:

  - Documentation updates.

  - Miscellaneous fixes.

  - Dynamic tick (nohz) updates, perhaps most notably changes to
    force the tick on when needed due to lengthy in-kernel execution
    on CPUs on which RCU is waiting.

  - Replace rcu_swap_protected() with rcu_prepace_pointer().

  - Torture-test updates.

  - Linux-kernel memory consistency model updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31 09:33:19 +01:00
Mark Rutland
b1fc583335 stop_machine: Avoid potential race behaviour
Both multi_cpu_stop() and set_state() access multi_stop_data::state
racily using plain accesses. These are subject to compiler
transformations which could break the intended behaviour of the code,
and this situation is detected by KCSAN on both arm64 and x86 (splats
below).

Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the
compiler cannot elide, replay, or tear loads and stores.

In multi_cpu_stop() the two loads of multi_stop_data::state are expected to
be a consistent value, so snapshot the value into a temporary variable to
ensure this.

The state transitions are serialized by atomic manipulation of
multi_stop_data::num_threads, and other fields in multi_stop_data are not
modified while subject to concurrent reads.

KCSAN splat on arm64:

| BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
|
| write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
|  set_state+0x80/0xb0
|  multi_cpu_stop+0x16c/0x198
|  cpu_stopper_thread+0x170/0x298
|  smpboot_thread_fn+0x40c/0x560
|  kthread+0x1a8/0x1b0
|  ret_from_fork+0x10/0x18
|
| read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
|  multi_cpu_stop+0xa8/0x198
|  cpu_stopper_thread+0x170/0x298
|  smpboot_thread_fn+0x40c/0x560
|  kthread+0x1a8/0x1b0
|  ret_from_fork+0x10/0x18
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
| Hardware name: linux,dummy-virt (DT)

KCSAN splat on x86:

| write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
|  set_state kernel/stop_machine.c:170 [inline]
|  ack_state kernel/stop_machine.c:177 [inline]
|  multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
|  cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
|  smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
|  kthread+0x1b5/0x200 kernel/kthread.c:255
|  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
|  multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
|  cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
|  smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
|  kthread+0x1b5/0x200 kernel/kthread.c:255
|  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20191007104536.27276-1-mark.rutland@arm.com
2019-10-17 12:47:12 +02:00
Paul E. McKenney
366237e7b0 stop_machine: Provide RCU quiescent state in multi_cpu_stop()
When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU
CPU stall warning.  This can be misleading because what is instead needed
is information on whatever task is blocking multi_cpu_stop().  This commit
therefore inserts an RCU quiescent state into the multi_cpu_stop()
function's waitloop.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Peter Zijlstra
99d84bf8c6 stop_machine: Fix stop_cpus_in_progress ordering
Make sure the entire for loop has stop_cpus_in_progress set.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lwe@gmail.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: mingo@kernel.org
Cc: Phil Auld <pauld@redhat.com>
Cc: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>
Link: https://lkml.kernel.org/r/0fd8fd4b99b9b9aa88d8b2dff897f7fd0d88f72c.1559129225.git.vpillai@digitalocean.com
2019-08-08 09:09:30 +02:00
Heiko Carstens
4ecf0a43e7 processor: get rid of cpu_relax_yield
stop_machine is the only user left of cpu_relax_yield. Given that it
now has special semantics which are tied to stop_machine introduce a
weak stop_machine_yield function which architectures can override, and
get rid of the generic cpu_relax_yield implementation.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-15 12:25:55 +02:00
Martin Schwidefsky
38f2c691a4 s390: improve wait logic of stop_machine
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.

On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.

Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.

[heiko.carstens@de.ibm.com]:
    Use cpumask_next_wrap as suggested by Peter Zijlstra.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-15 12:25:52 +02:00
Thomas Gleixner
6ff3f917e0 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 38
Based on 1 normalized pattern(s):

  this file is released under the gplv2 and any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170857.732920462@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24 17:27:11 +02:00
Sakari Ailus
d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Linus Torvalds
f7951c33f0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:

 - Cleanup and improvement of NUMA balancing

 - Refactoring and improvements to the PELT (Per Entity Load Tracking)
   code

 - Watchdog simplification and related cleanups

 - The usual pile of small incremental fixes and improvements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  watchdog: Reduce message verbosity
  stop_machine: Reflow cpu_stop_queue_two_works()
  sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
  sched/numa: Use group_weights to identify if migration degrades locality
  sched/numa: Update the scan period without holding the numa_group lock
  sched/numa: Remove numa_has_capacity()
  sched/numa: Modify migrate_swap() to accept additional parameters
  sched/numa: Remove unused task_capacity from 'struct numa_stats'
  sched/numa: Skip nodes that are at 'hoplimit'
  sched/debug: Reverse the order of printing faults
  sched/numa: Use task faults only if numa_group is not yet set up
  sched/numa: Set preferred_node based on best_cpu
  sched/numa: Simplify load_too_imbalanced()
  sched/numa: Evaluate move once per node
  sched/numa: Remove redundant field
  sched/debug: Show the sum wait time of a task group
  sched/fair: Remove #ifdefs from scale_rt_capacity()
  sched/core: Remove get_cpu() from sched_fork()
  sched/cpufreq: Clarify sugov_get_util()
  sched/sysctl: Remove unused sched_time_avg_ms sysctl
  ...
2018-08-13 11:25:07 -07:00
Prasad Sodagudi
cfd355145c stop_machine: Atomically queue and wake stopper threads
When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:

CPU0                              CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...)              stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...)       cpu_stop_queue_two_works(0, ..., 1, ...)

-grabs lock for migration/0-
                                  -spins with preemption disabled,
                                   waiting for migration/0's lock to be
                                   released-

-adds work items for migration/0
and queues migration/0 to its
wake_q-

-releases lock for migration/0
 and preemption is enabled-

-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-

                                  -acquires migration/0 and migration/1's
                                   locks-

                                  -adds work for migration/0 but does not
                                   add migration/0 to wake_q, since it is
                                   already in a wake_q-

                                  -adds work for migration/1 and adds
                                   migration/1 to its wake_q-

                                  -releases migration/0 and migration/1's
                                   locks, wakes migration/1, and enables
                                   preemption-

                                  -since migration/1 is requested to run,
                                   migration/1 begins to run and waits on
                                   migration/0, but migration/0 will never
                                   be able to run, since the thread that
                                   can wake it is affine to CPU1-

Disable preemption in cpu_stop_queue_work() before queueing works for
stopper threads, and queueing the stopper thread in the wake queue, to
ensure that the operation of queueing the works and waking the stopper
threads is atomic.

Fixes: 0b26351b91 ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org

Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-05 21:58:31 +02:00
Peter Zijlstra
b80a2bfce8 stop_machine: Reflow cpu_stop_queue_two_works()
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
lifting the preempt_disable() to the top to create more natural nesting wrt
the spinlocks and make the wake_up_q() and preempt_enable() unconditional
at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
with preemption enabled.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: isaacm@codeaurora.org
Cc: matt@codeblueprint.co.uk
Cc: psodagud@codeaurora.org
Cc: gregkh@linuxfoundation.org
Cc: pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
2018-08-02 15:25:20 +02:00
Isaac J. Manjarres
2610e88946 stop_machine: Disable preemption after queueing stopper threads
This commit:

  9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: matt@codeblueprint.co.uk
Fixes: 9fb8d5dc4b ("stop_machine, Disable preemption when waking two stopper threads")
Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 11:25:08 +02:00
Isaac J. Manjarres
9fb8d5dc4b stop_machine: Disable preemption when waking two stopper threads
When cpu_stop_queue_two_works() begins to wake the stopper threads, it does
so without preemption disabled, which leads to the following race
condition:

The source CPU calls cpu_stop_queue_two_works(), with cpu1 as the source
CPU, and cpu2 as the destination CPU. When adding the stopper threads to
the wake queue used in this function, the source CPU stopper thread is
added first, and the destination CPU stopper thread is added last.

When wake_up_q() is invoked to wake the stopper threads, the threads are
woken up in the order that they are queued in, so the source CPU's stopper
thread is woken up first, and it preempts the thread running on the source
CPU.

The stopper thread will then execute on the source CPU, disable preemption,
and begin executing multi_cpu_stop(), and wait for an ack from the
destination CPU's stopper thread, with preemption still disabled. Since the
worker thread that woke up the stopper thread on the source CPU is affine
to the source CPU, and preemption is disabled on the source CPU, that
thread will never run to dequeue the destination CPU's stopper thread from
the wake queue, and thus, the destination CPU's stopper thread will never
run, causing the source CPU's stopper thread to wait forever, and stall.

Disable preemption when waking the stopper threads in
cpu_stop_queue_two_works().

Fixes: 0b26351b91 ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1530655334-4601-1-git-send-email-isaacm@codeaurora.org
2018-07-15 12:12:45 +02:00
Ingo Molnar
371b326908 Merge tag 'v4.17-rc5' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15 08:10:50 +02:00
Peter Zijlstra
0b26351b91 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
Matt reported the following deadlock:

CPU0					CPU1

schedule(.prev=migrate/0)		<fault>
  pick_next_task()			  ...
    idle_balance()			    migrate_swap()
      active_balance()			      stop_two_cpus()
						spin_lock(stopper0->lock)
						spin_lock(stopper1->lock)
						ttwu(migrate/0)
						  smp_cond_load_acquire() -- waits for schedule()
        stop_one_cpu(1)
	  spin_lock(stopper1->lock) -- waits for stopper lock

Fix this deadlock by taking the wakeups out from under stopper->lock.
This allows the active_balance() to queue the stop work and finish the
context switch, which in turn allows the wakeup from migrate_swap() to
observe the context and complete the wakeup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 07:38:03 +02:00
Thomas Gleixner
de5b55c1d4 stop_machine: Use raw spinlocks
Use raw-locks in stop_machine() to allow locking in irq-off and
preempt-disabled regions on -RT. This also documents the possible locking
context in general.

[bigeasy: update patch description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20180423191635.6014-1-bigeasy@linutronix.de
2018-04-27 14:34:51 +02:00
Sebastian Andrzej Siewior
fe5595c074 stop_machine: Provide stop_machine_cpuslocked()
Some call sites of stop_machine() are within a get_online_cpus() protected
region.

stop_machine() calls get_online_cpus() as well, which is possible in the
current implementation but prevents converting the hotplug locking to a
percpu rwsem.

Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.400700852@linutronix.de
2017-05-26 10:10:36 +02:00
Christian Borntraeger
bf0d31c054 locking/core, stop_machine: Yield the CPU during stop machine()
Some time ago the following commit:

  57f2ffe14f ("s390: remove diag 44 calls from cpu_relax()")

... stopped cpu_relax() on s390 yielding to the hypervisor.

As it turns out this made stop_machine() run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, yielding was reintroduced with commit:

  4d92f50249 ("s390: reintroduce diag 44 calls for cpu_relax()")

... but in fact the stop machine code seems to be the only place where
this yielding was really necessary. This place is probably the most
important one as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield(), we can use this in multi_cpu_stop().
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 10:15:09 +01:00
Linus Torvalds
af79ad2b1f Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes are:

   - irqtime accounting cleanups and enhancements. (Frederic Weisbecker)

   - schedstat debugging enhancements, make it more broadly runtime
     available. (Josh Poimboeuf)

   - More work on asymmetric topology/capacity scheduling. (Morten
     Rasmussen)

   - sched/wait fixes and cleanups. (Oleg Nesterov)

   - PELT (per entity load tracking) improvements. (Peter Zijlstra)

   - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra)

   - sched/numa enhancements/fixes (Rik van Riel)

   - sched/cputime scalability improvements (Stanislaw Gruszka)

   - Load calculation arithmetics fixes. (Dietmar Eggemann)

   - sched/deadline enhancements (Tommaso Cucinotta)

   - Fix utilization accounting when switching to the SCHED_NORMAL
     policy. (Vincent Guittot)

   - ... plus misc cleanups and enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched/irqtime: Consolidate irqtime flushing code
  sched/irqtime: Consolidate accounting synchronization with u64_stats API
  u64_stats: Introduce IRQs disabled helpers
  sched/irqtime: Remove needless IRQs disablement on kcpustat update
  sched/irqtime: No need for preempt-safe accessors
  sched/fair: Fix min_vruntime tracking
  sched/debug: Add SCHED_WARN_ON()
  sched/core: Fix set_user_nice()
  sched/fair: Introduce set_curr_task() helper
  sched/core, ia64: Rename set_curr_task()
  sched/core: Fix incorrect utilization accounting when switching to fair class
  sched/core: Optimize SCHED_SMT
  sched/core: Rewrite and improve select_idle_siblings()
  sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared
  sched/core: Introduce 'struct sched_domain_shared'
  sched/core: Restructure destroy_sched_domain()
  sched/core: Remove unused @cpu argument from destroy_sched_domain*()
  sched/wait: Introduce init_wait_entry()
  sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
  sched/wait: Avoid abort_exclusive_wait() in ___wait_event()
  ...
2016-10-03 13:39:00 -07:00