Commit Graph

23394 Commits

Author SHA1 Message Date
Tobias Klauser f5d6d2da0d sched/fair: Remove unused but set variable 'rq'
Since commit:

  8663e24d56 ("sched/fair: Reorder cgroup creation code")

... the variable 'rq' in alloc_fair_sched_group() is set but no longer used.
Remove it to fix the following GCC warning when building with 'W=1':

  kernel/sched/fair.c:8842:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161026113704.8981-1-tklauser@distanz.ch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-27 08:33:52 +02:00
Linus Torvalds bfb7bfef6f Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Mostly irqchip driver fixes, plus a symbol export"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kernel/irq: Export irq_set_parent()
  irqchip/gic: Add missing \n to CPU IF adjustment message
  irqchip/jcore: Don't show Kconfig menu item for driver
  irqchip/eznps: Drop pointless static qualifier in nps400_of_init()
  irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
  irqchip/gic-v3-its: Fix 64bit GIC{R,ITS}_TYPER accesses
2016-10-22 09:33:51 -07:00
Sudip Mukherjee 3118dac501 kernel/irq: Export irq_set_parent()
The TPS65217 driver grew interrupt support which uses
irq_set_parent(). While it's not yet clear why this is used in the first
place, building the driver as a module fails with:

 ERROR: ".irq_set_parent" [drivers/mfd/tps65217.ko] undefined!

The correctness of the driver change is still investigated, but for now
it's less trouble to export irq_set_parent() than dealing with the build
wreckage.

[ tglx: Rewrote changelog and made the export GPL ]

Fixes: 6556bdacf6 ("mfd: tps65217: Add support for IRQs")
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Marcin Niestroj <m.niestroj@grinn-global.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Lee Jones <lee.jones@linaro.org>
Link: http://lkml.kernel.org/r/1475775403-27207-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-21 10:21:38 +02:00
Linus Torvalds 893e2c5c9f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "This fixes a group scheduling related performance/interactivity
  regression introduced in v4.8, which affects certain hardware
  environments where cpu_possible_mask != cpu_present_mask"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix incorrect task group ->load_avg
2016-10-19 10:03:55 -07:00
Linus Torvalds 8835ca59da printk: suppress empty continuation lines
We have a fairly common pattern where you print several things as
continuations on one single line in a loop, and then at the end you do

	printk(KERN_CONT "\n");

to flush the buffered output.

But if the output was flushed by something else (concurrent printk
activity, or just system logging), we don't want that final flushing to
just print an empty line.

So just suppress empty continuation lines when they couldn't be merged
into the line they are a continuation of.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 09:11:24 -07:00
Linus Torvalds 63ae602cea Merge branch 'gup_flag-cleanups'
Merge the gup_flags cleanups from Lorenzo Stoakes:
 "This patch series adjusts functions in the get_user_pages* family such
  that desired FOLL_* flags are passed as an argument rather than
  implied by flags.

  The purpose of this change is to make the use of FOLL_FORCE explicit
  so it is easier to grep for and clearer to callers that this flag is
  being used.  The use of FOLL_FORCE is an issue as it overrides missing
  VM_READ/VM_WRITE flags for the VMA whose pages we are reading
  from/writing to, which can result in surprising behaviour.

  The patch series came out of the discussion around commit 38e0885465
  ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
  which addressed a BUG_ON() being triggered when a page was faulted in
  with PROT_NONE set but having been overridden by FOLL_FORCE.
  do_numa_page() was run on the assumption the page _must_ be one marked
  for NUMA node migration as an actual PROT_NONE page would have been
  dealt with prior to this code path, however FOLL_FORCE introduced a
  situation where this assumption did not hold.

  See

      https://marc.info/?l=linux-mm&m=147585445805166

  for the patch proposal"

Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.

[ This branch was rebased recently to add a few more acked-by's and
  reviewed-by's ]

* gup_flag-cleanups:
  mm: replace access_process_vm() write parameter with gup_flags
  mm: replace access_remote_vm() write parameter with gup_flags
  mm: replace __access_remote_vm() write parameter with gup_flags
  mm: replace get_user_pages_remote() write/force parameters with gup_flags
  mm: replace get_user_pages() write/force parameters with gup_flags
  mm: replace get_vaddr_frames() write/force parameters with gup_flags
  mm: replace get_user_pages_locked() write/force parameters with gup_flags
  mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
  mm: remove write/force parameters from __get_user_pages_unlocked()
  mm: remove write/force parameters from __get_user_pages_locked()
  mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19 08:39:47 -07:00
Lorenzo Stoakes f307ab6dce mm: replace access_process_vm() write parameter with gup_flags
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:31:25 -07:00
Lorenzo Stoakes 9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Vincent Guittot b5a9b34078 sched/fair: Fix incorrect task group ->load_avg
A scheduler performance regression has been reported by Joseph Salisbury,
which he bisected back to:

  3d30544f02 ("sched/fair: Apply more PELT fixes)

The regression triggers when several levels of task groups are involved
(read: SystemD) and cpu_possible_mask != cpu_present_mask.

The root cause is that group entity's load (tg_child->se[i]->avg.load_avg)
is initialized to scale_load_down(se->load.weight). During the creation of
a child task group, its group entities on possible CPUs are attached to
parent's cfs_rq (tg_parent) and their loads are added to the parent's load
(tg_parent->load_avg) with update_tg_load_avg().

But only the load on online CPUs will then be updated to reflect real load,
whereas load on other CPUs will stay at the initial value.

The result is a tg_parent->load_avg that is higher than the real load, the
weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is
smaller than it should be, and the task group gets a less running time than
what it could expect.

( This situation can be detected with /proc/sched_debug. The ".tg_load_avg"
  of the task group will be much higher than sum of ".tg_load_avg_contrib"
  of online cfs_rqs of the task group. )

The load of group entities don't have to be intialized to something else
than 0 because their load will increase when an entity is attached.

Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org> # 4.8.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joonwoop@codeaurora.org
Fixes: 3d30544f02 ("sched/fair: Apply more PELT fixes)
Link: http://lkml.kernel.org/r/1476881123-10159-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-19 15:04:47 +02:00
Linus Torvalds 9d71bdfbcb Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixlet from Ingo Molnar:
 "Remove an unused variable"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Remove unused but set variable
2016-10-18 09:57:12 -07:00
Linus Torvalds 2c11fc87ca Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a crash that can trigger when racing with CPU hotplug: we didn't
  use sched-domains data structures carefully enough in select_idle_cpu()"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix sched domains NULL dereference in select_idle_sibling()
2016-10-18 09:53:59 -07:00
Linus Torvalds 351267d941 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc fixes from Ingo Molnar:
 "A CPU hotplug debuggability fix and three objtool false positive
  warnings fixes for new GCC6 code generation patterns"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Use distinct name for cpu_hotplug.dep_map
  objtool: Skip all "unreachable instruction" warnings for gcov kernels
  objtool: Improve rare switch jump table pattern detection
  objtool: Support '-mtune=atom' stack frame setup instruction
2016-10-18 08:35:07 -07:00
Tobias Klauser 54e23845e9 alarmtimer: Remove unused but set variable
Remove the set but unused variable base in alarm_clock_get to fix the
following warning when building with 'W=1':

  kernel/time/alarmtimer.c: In function ‘alarm_timer_create’:
  kernel/time/alarmtimer.c:545:21: warning: variable ‘base’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161017094702.10873-1-tklauser@distanz.ch
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-17 11:59:37 +02:00
Joonas Lahtinen a705e07b9c cpu/hotplug: Use distinct name for cpu_hotplug.dep_map
Use distinctive name for cpu_hotplug.dep_map to avoid the actual
cpu_hotplug.lock appearing as cpu_hotplug.lock#2 in lockdep splats.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Gautham R . Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16 11:09:32 +02:00
Linus Torvalds 9ffc66941d Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugins update from Kees Cook:
 "This adds a new gcc plugin named "latent_entropy". It is designed to
  extract as much possible uncertainty from a running system at boot
  time as possible, hoping to capitalize on any possible variation in
  CPU operation (due to runtime data differences, hardware differences,
  SMP ordering, thermal timing variation, cache behavior, etc).

  At the very least, this plugin is a much more comprehensive example
  for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Mark functions with __latent_entropy
  gcc-plugins: Add latent_entropy plugin
2016-10-15 10:03:15 -07:00
Linus Torvalds f34d3606f7 Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - tracepoints for basic cgroup management operations added

 - kernfs and cgroup path formatting functions updated to behave in the
   style of strlcpy()

 - non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
  cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
  cpuset: fix error handling regression in proc_cpuset_show()
  cgroup: add tracepoints for basic operations
  cgroup: make cgroup_path() and friends behave in the style of strlcpy()
  kernfs: remove kernfs_path_len()
  kernfs: make kernfs_path*() behave in the style of strlcpy()
  kernfs: add dummy implementation of kernfs_path_from_node()
2016-10-14 12:18:50 -07:00
Linus Torvalds ef6000b4c6 Disable the __builtin_return_address() warning globally after all
This affectively reverts commit 377ccbb483 ("Makefile: Mute warning
for __builtin_return_address(>0) for tracing only") because it turns out
that it really isn't tracing only - it's all over the tree.

We already also had the warning disabled separately for mm/usercopy.c
(which this commit also removes), and it turns out that we will also
want to disable it for get_lock_parent_ip(), that is used for at least
TRACE_IRQFLAGS.  Which (when enabled) ends up being all over the tree.

Steven Rostedt had a patch that tried to limit it to just the config
options that actually triggered this, but quite frankly, the extra
complexity and abstraction just isn't worth it.  We have never actually
had a case where the warning is actually useful, so let's just disable
it globally and not worry about it.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-12 10:23:41 -07:00
John Siddle 48a6d64eda hung_task: allow hung_task_panic when hung_task_warnings is 0
Previously hung_task_panic would not be respected if enabled after
hung_task_warnings had already been decremented to 0.

Permit the kernel to panic if hung_task_panic is enabled after
hung_task_warnings has already been decremented to 0 and another task
hangs for hung_task_timeout_secs seconds.

Check if hung_task_panic is enabled so we don't return prematurely, and
check if hung_task_warnings is non-zero so we don't print the warning
unnecessarily.

[akpm@linux-foundation.org: fix off-by-one]
Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.com
Signed-off-by: John Siddle <jsiddle@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek dbf52682cb kthread: better support freezable kthread workers
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek 9a6b06c8d9 kthread: allow to modify delayed kthread work
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements kthread_mod_delayed_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.

A very special case is when the work is being canceled at the same time.
It might happen because of the regular kthread_cancel_delayed_work_sync()
or by another kthread_mod_delayed_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.

Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek 37be45d49d kthread: allow to cancel kthread work
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues.  Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations.  Therefore the implementation might be easier.

First, we check if a worker is assigned.  If not, the work has newer been
queued after it was initialized.

Second, we take the worker lock.  It must be the right one.  The work must
not be assigned to another worker unless it is initialized in between.

Third, we try to cancel the timer when it exists.  The timer is deleted
synchronously to make sure that the timer call back is not running.  We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback.  In the meantime, we set work->canceling counter to
avoid any queuing.

Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.

Fifth, if the work is running, we call kthread_flush_work().  It might
take an arbitrary time.  We need to release the worker-lock again.  In the
meantime, we again block any queuing by the canceling counter.

As already mentioned, the check for a pending kthread work is done under a
lock.  In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations.  Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work().  Any queuing is blocked until the counter gets zero.

Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek 22597dc3d9 kthread: initial support for delayed kthread work
We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues.  It uses a timer to queue
the work after the requested delay.  If the delay is zero, the work is
queued immediately.

In compare with workqueues, each work is associated with a single worker
(kthread).  Therefore the implementation could be much easier.  In
particular, we use the worker->lock to synchronize all the operations with
the work.  We do not need any atomic operation with a flags variable.

In fact, we do not need any state variable at all.  Instead, we add a list
of delayed works into the worker.  Then the pending work is listed either
in the list of queued or delayed works.  And the existing check of pending
works is the same even for the delayed ones.

A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker is valid
and it could simply take the lock.  We just add some sanity checks to help
with debugging a potential misuse.

Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek 8197b3d43b kthread: detect when a kthread work is used by more workers
Nothing currently prevents a work from queuing for a kthread worker when
it is already running on another one.  This means that the work might run
in parallel on more than one worker.  Also some operations are not
reliable, e.g.  flush.

This problem will be even more visible after we add kthread_cancel_work()
function.  It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker.  It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

It adds the same warning also into kthread_flush_work() instead of the
repeated attempts to get the right lock.

A side effect is that one needs to explicitly reinitialize the work if it
must be queued into another worker.  This is needed, for example, when the
worker is stopped and started again.  It is a bit inconvenient.  But it
looks like a good compromise between the stability and complexity.

I have double checked all existing users of the kthread worker API and
they all seems to initialize the work after the worker gets started.

Just for completeness, the patch adds a check that the work is not already
in a queue.

The patch also puts all the checks into a separate function.  It will be
reused when implementing delayed works.

Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek 35033fe9cb kthread: add kthread_destroy_worker()
The current kthread worker users call flush() and stop() explicitly.
This function does the same plus it frees the kthread_worker struct
in one call.

It is supposed to be used together with kthread_create_worker*() that
allocates struct kthread_worker.

Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek fbae2d44aa kthread: add kthread_create_worker*()
Kthread workers are currently created using the classic kthread API,
namely kthread_run().  kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines kthread_create_worker() and
kthread_create_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread.  But I doubt
that there are any plans to create any alternative.  In fact, I think that
we do not want any alternative main thread because it would be hard to
support consistency with the rest of the kthread worker API.

The naming and function of kthread_create_worker() is inspired by the
workqueues API like the rest of the kthread worker API.

The kthread_create_worker_on_cpu() variant is motivated by the original
kthread_create_on_cpu().  Note that we need to bind per-CPU kthread
workers already when they are created.  It makes the life easier.
kthread_bind() could not be used later for an already running worker.

This patch does _not_ convert existing kthread workers.  The kthread
worker API need more improvements first, e.g.  a function to destroy the
worker.

IMPORTANT:

kthread_create_worker_on_cpu() allows to use any format of the worker
name, in compare with kthread_create_on_cpu().  The good thing is that it
is more generic.  The bad thing is that most users will need to pass the
cpu number in two parameters, e.g.  kthread_create_worker_on_cpu(cpu,
"helper/%d", cpu).

To be honest, the main motivation was to avoid the need for an empty
va_list.  The only legal way was to create a helper function that would be
called with an empty list.  Other attempts caused compilation warnings or
even errors on different architectures.

There were also other alternatives, for example, using #define or
splitting __kthread_create_worker().  The used solution looked like the
least ugly.

Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00