Commit Graph

638 Commits

Author SHA1 Message Date
Linus Torvalds
e431f2d74e Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the big driver core patchset for 5.1-rc1

  More patches than "normal" here this merge window, due to some work in
  the driver core by Alexander Duyck to rework the async probe
  functionality to work better for a number of devices, and independant
  work from Rafael for the device link functionality to make it work
  "correctly".

  Also in here is:

   - lots of BUS_ATTR() removals, the macro is about to go away

   - firmware test fixups

   - ihex fixups and simplification

   - component additions (also includes i915 patches)

   - lots of minor coding style fixups and cleanups.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
  driver core: platform: remove misleading err_alloc label
  platform: set of_node in platform_device_register_full()
  firmware: hardcode the debug message for -ENOENT
  driver core: Add missing description of new struct device_link field
  driver core: Fix PM-runtime for links added during consumer probe
  drivers/component: kerneldoc polish
  async: Add cmdline option to specify drivers to be async probed
  driver core: Fix possible supplier PM-usage counter imbalance
  PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
  driver: platform: Support parsing GpioInt 0 in platform_get_irq()
  selftests: firmware: fix verify_reqs() return value
  Revert "selftests: firmware: remove use of non-standard diff -Z option"
  Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
  device: Fix comment for driver_data in struct device
  kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
  sysfs: remove unused include of kernfs-internal.h
  driver core: Postpone DMA tear-down until after devres release
  driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
  PM-runtime: Take suppliers into account in __pm_runtime_set_status()
  device.h: Add __cold to dev_<level> logging functions
  ...
2019-03-06 14:52:48 -08:00
Bart Van Assche
669de8bda8 kernel/workqueue: Use dynamic lockdep keys for workqueues
The following commit:

  87915adc3f ("workqueue: re-add lockdep dependencies for flushing")

improved deadlock checking in the workqueue implementation. Unfortunately
that patch also introduced a few false positive lockdep complaints.

This patch suppresses these false positives by allocating the workqueue mutex
lockdep key dynamically.

An example of a false positive lockdep complaint suppressed by this patch
can be found below. The root cause of the lockdep complaint shown below
is that the direct I/O code can call alloc_workqueue() from inside a work
item created by another alloc_workqueue() call and that both workqueues
share the same lockdep key. This patch avoids that that lockdep complaint
is triggered by allocating the work queue lockdep keys dynamically.

In other words, this patch guarantees that a unique lockdep key is
associated with each work queue mutex.

  ======================================================
  WARNING: possible circular locking dependency detected
  4.19.0-dbg+ #1 Not tainted
  fio/4129 is trying to acquire lock:
  00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970

  but task is already holding lock:
  00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
         down_write+0x3d/0x80
         __generic_file_fsync+0x77/0xf0
         ext4_sync_file+0x3c9/0x780
         vfs_fsync_range+0x66/0x100
         dio_complete+0x2f5/0x360
         dio_aio_complete_work+0x1c/0x20
         process_one_work+0x481/0x9f0
         worker_thread+0x63/0x5a0
         kthread+0x1cf/0x1f0
         ret_from_fork+0x24/0x30

  -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
         process_one_work+0x447/0x9f0
         worker_thread+0x63/0x5a0
         kthread+0x1cf/0x1f0
         ret_from_fork+0x24/0x30

  -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
         lock_acquire+0xc5/0x200
         flush_workqueue+0xf3/0x970
         drain_workqueue+0xec/0x220
         destroy_workqueue+0x23/0x350
         sb_init_dio_done_wq+0x6a/0x80
         do_blockdev_direct_IO+0x1f33/0x4be0
         __blockdev_direct_IO+0x79/0x86
         ext4_direct_IO+0x5df/0xbb0
         generic_file_direct_write+0x119/0x220
         __generic_file_write_iter+0x131/0x2d0
         ext4_file_write_iter+0x3fa/0x710
         aio_write+0x235/0x330
         io_submit_one+0x510/0xeb0
         __x64_sys_io_submit+0x122/0x340
         do_syscall_64+0x71/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe

  other info that might help us debug this:

  Chain exists of:
    (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&sb->s_type->i_mutex_key#14);
                                 lock((work_completion)(&dio->complete_work));
                                 lock(&sb->s_type->i_mutex_key#14);
    lock((wq_completion)"dio/%s"sb->s_id);

   *** DEADLOCK ***

  1 lock held by fio/4129:
   #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710

  stack backtrace:
  CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
  Call Trace:
   dump_stack+0x86/0xc5
   print_circular_bug.isra.32+0x20a/0x218
   __lock_acquire+0x1c68/0x1cf0
   lock_acquire+0xc5/0x200
   flush_workqueue+0xf3/0x970
   drain_workqueue+0xec/0x220
   destroy_workqueue+0x23/0x350
   sb_init_dio_done_wq+0x6a/0x80
   do_blockdev_direct_IO+0x1f33/0x4be0
   __blockdev_direct_IO+0x79/0x86
   ext4_direct_IO+0x5df/0xbb0
   generic_file_direct_write+0x119/0x220
   __generic_file_write_iter+0x131/0x2d0
   ext4_file_write_iter+0x3fa/0x710
   aio_write+0x235/0x330
   io_submit_one+0x510/0xeb0
   __x64_sys_io_submit+0x122/0x340
   do_syscall_64+0x71/0x220
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
[ Reworked the changelog a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28 07:55:47 +01:00
Greg Kroah-Hartman
9481caf39b Merge 5.0-rc6 into driver-core-next
We need the debugfs fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-11 09:09:02 +01:00
Johannes Weiner
1b69ac6b40 psi: fix aggregation idle shut-off
psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating.  However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again.  This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep.  To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself.  If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity.  But that
should not be a problem.  The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation.  If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Alexander Duyck
8204e0c111 workqueue: Provide queue_work_node to queue work near a given NUMA node
Provide a new function, queue_work_node, which is meant to schedule work on
a "random" CPU of the requested NUMA node. The main motivation for this is
to help assist asynchronous init to better improve boot times for devices
that are local to a specific node.

For now we just default to the first CPU that is in the intersection of the
cpumask of the node and the online cpumask. The only exception is if the
CPU is local to the node we will just use the current CPU. This should work
for our purposes as we are currently only using this for unbound work so
the CPU will be translated to a node anyway instead of being directly used.

As we are only using the first CPU to represent the NUMA node for now I am
limiting the scope of the function so that it can only be used with unbound
workqueues.

Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 14:20:54 +01:00
Paul E. McKenney
25b0077511 workqueue: Replace call_rcu_sched() with call_rcu()
Now that call_rcu()'s callback is not invoked until after all
preempt-disable regions of code have completed (in addition to explicitly
marked RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_sched().  This commit therefore makes that change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-11-27 09:21:44 -08:00
Vincent Whitchurch
cb9d7fd51d watchdog: Mark watchdog touch functions as notrace
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations.  This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
2018-08-30 12:56:40 +02:00
Linus Torvalds
9022ada8ab Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Over the lockdep cross-release churn, workqueue lost some of the
  existing annotations. Johannes Berg restored it and also improved
  them"

* 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: re-add lockdep dependencies for flushing
  workqueue: skip lockdep wq dependency in cancel_work_sync()
2018-08-24 13:16:36 -07:00
Johannes Berg
87915adc3f workqueue: re-add lockdep dependencies for flushing
In flush_work(), we need to create a lockdep dependency so that
the following scenario is appropriately tagged as a problem:

  work_function()
  {
    mutex_lock(&mutex);
    ...
  }

  other_function()
  {
    mutex_lock(&mutex);
    flush_work(&work); // or cancel_work_sync(&work);
  }

This is a problem since the work might be running and be blocked
on trying to acquire the mutex.

Similarly, in flush_workqueue().

These were removed after cross-release partially caught these
problems, but now cross-release was reverted anyway. IMHO the
removal was erroneous anyway though, since lockdep should be
able to catch potential problems, not just actual ones, and
cross-release would only have caught the problem when actually
invoking wait_for_completion().

Fixes: fd1a5b04df ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-08-22 08:31:38 -07:00
Johannes Berg
d6e89786be workqueue: skip lockdep wq dependency in cancel_work_sync()
In cancel_work_sync(), we can only have one of two cases, even
with an ordered workqueue:
 * the work isn't running, just cancelled before it started
 * the work is running, but then nothing else can be on the
   workqueue before it

Thus, we need to skip the lockdep workqueue dependency handling,
otherwise we get false positive reports from lockdep saying that
we have a potential deadlock when the workqueue also has other
work items with locking, e.g.

  work1_function() { mutex_lock(&mutex); ... }
  work2_function() { /* nothing */ }

  other_function() {
    queue_work(ordered_wq, &work1);
    queue_work(ordered_wq, &work2);
    mutex_lock(&mutex);
    cancel_work_sync(&work2);
  }

As described above, this isn't a problem, but lockdep will
currently flag it as if cancel_work_sync() was flush_work(),
which *is* a problem.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-08-22 08:31:37 -07:00
Kees Cook
6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds
5f85942c2e Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
  xfcp, hisi_sas, cxlflash, qla2xxx.

  In the absence of Nic, we're also taking target updates which are
  mostly minor except for the tcmu refactor.

  The only real core change to worry about is the removal of high page
  bouncing (in sas, storvsc and iscsi). This has been well tested and no
  problems have shown up so far"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
  scsi: lpfc: update driver version to 12.0.0.4
  scsi: lpfc: Fix port initialization failure.
  scsi: lpfc: Fix 16gb hbas failing cq create.
  scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
  scsi: lpfc: correct oversubscription of nvme io requests for an adapter
  scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
  scsi: hisi_sas: Mark PHY as in reset for nexus reset
  scsi: hisi_sas: Fix return value when get_free_slot() failed
  scsi: hisi_sas: Terminate STP reject quickly for v2 hw
  scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
  scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
  scsi: hisi_sas: Try wait commands before before controller reset
  scsi: hisi_sas: Init disks after controller reset
  scsi: hisi_sas: Create a scsi_host_template per HW module
  scsi: hisi_sas: Reset disks when discovered
  scsi: hisi_sas: Add LED feature for v3 hw
  scsi: hisi_sas: Change common allocation mode of device id
  scsi: hisi_sas: change slot index allocation mode
  scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
  scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
  ...
2018-06-10 13:01:12 -07:00
Linus Torvalds
2857676045 Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
 "This adds the new overflow checking helpers and adds them to the
  2-factor argument allocators. And this adds the saturating size
  helpers and does a treewide replacement for the struct_size() usage.
  Additionally this adds the overflow testing modules to make sure
  everything works.

  I'm still working on the treewide replacements for allocators with
  "simple" multiplied arguments:

     *alloc(a * b, ...) -> *alloc_array(a, b, ...)

  and

     *zalloc(a * b, ...) -> *calloc(a, b, ...)

  as well as the more complex cases, but that's separable from this
  portion of the series. I expect to have the rest sent before -rc1
  closes; there are a lot of messy cases to clean up.

  Summary:

   - Introduce arithmetic overflow test helper functions (Rasmus)

   - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

   - Introduce overflow test module (Rasmus, Kees)

   - Introduce saturating size helper functions (Matthew, Kees)

   - Treewide use of struct_size() for allocators (Kees)"

* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use struct_size() for devm_kmalloc() and friends
  treewide: Use struct_size() for vmalloc()-family
  treewide: Use struct_size() for kmalloc()-family
  device: Use overflow helpers for devm_kmalloc()
  mm: Use overflow helpers in kvmalloc()
  mm: Use overflow helpers in kmalloc_array*()
  test_overflow: Add memory allocation overflow tests
  overflow.h: Add allocation size calculation helpers
  test_overflow: Report test failures
  test_overflow: macrofy some more, do more tests for free
  lib: add runtime test of check_*_overflow functions
  compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06 17:27:14 -07:00
Kees Cook
acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
Mathieu Malaterre
66448bc274 workqueue: move function definitions within CONFIG_SMP block
In commit 7ee681b252 ("workqueue: Convert to state machine callbacks"),
three new function definitions were added: ‘workqueue_prepare_cpu’,
‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’.

Move these function definitions within a CONFIG_SMP block since they are
not used outside of it. This will match function declarations in header
<include/linux/workqueue.h>, and silence the following gcc warning (W=1):

  kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes]
  kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes]
  kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-23 11:16:58 -07:00
Tejun Heo
197f6accac workqueue: Make sure struct worker is accessible for wq_worker_comm()
The worker struct could already be freed when wq_worker_comm() tries
to access it for reporting.  This patch protects PF_WQ_WORKER
modifications with wq_pool_attach_mutex and makes wq_worker_comm()
test the flag before dereferencing worker from kthread_data(), which
ensures that it only dereferences when the worker struct is valid.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 6b59808bfe ("workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}")
2018-05-21 08:04:35 -07:00
Tejun Heo
6b59808bfe workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

  # ps -ef | grep kworker
  root           4       2  0 Feb25 ?        00:00:00 [kworker/0:0H]
  root           6       2  0 Feb25 ?        00:00:00 [kworker/u112:0]
  root          19       2  0 Feb25 ?        00:00:00 [kworker/1:0H]
  root          25       2  0 Feb25 ?        00:00:00 [kworker/2:0H]
  root          31       2  0 Feb25 ?        00:00:00 [kworker/3:0H]
  ...

This patch makes workqueue workers report the latest workqueue it was
executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

  # cat /proc/25/comm
  kworker/2:0-events_power_efficient
  # cat /proc/25/stat
  25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
  # grep Name /proc/25/status
  Name:   kworker/2:0-events_power_efficient

Unfortunately, ps(1) truncates comm to 15 characters,

  # ps 25
    PID TTY      STAT   TIME COMMAND
     25 ?        I      0:00 [kworker/2:0-eve]

making it a lot less useful; however, this should be an easy fix from
ps(1) side.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Craig Small <csmall@enc.com.au>
2018-05-18 08:47:13 -07:00
Tejun Heo
8bf895931e workqueue: Set worker->desc to workqueue name by default
Work functions can use set_worker_desc() to improve the visibility of
what the worker task is doing.  Currently, the desc field is unset at
the beginning of each execution and there is a separate field to track
the field is set during the current execution.

Instead of leaving empty till desc is set, worker->desc can be used to
remember the last workqueue the worker worked on by default and users
that use set_worker_desc() can override it to something more
informative as necessary.

This simplifies desc handling and helps tracking the last workqueue
that the worker exected on to improve visibility.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Tejun Heo
a2d812a27a workqueue: Make worker_attach/detach_pool() update worker->pool
For historical reasons, the worker attach/detach functions don't
currently manage worker->pool and the callers are manually and
inconsistently updating it.

This patch moves worker->pool updates into the worker attach/detach
functions.  This makes worker->pool consistent and clearly defines how
worker->pool updates are synchronized.

This will help later workqueue visibility improvements by allowing
safe access to workqueue information from worker->task.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Tejun Heo
1258fae73c workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
To improve workqueue visibility, we want to be able to access
workqueue information from worker tasks.  The per-pool attach mutex
makes that difficult because there's no way of stabilizing task ->
worker pool association without knowing the pool first.

Worker attach/detach is a slow path and there's no need for different
pools to be able to perform them concurrently.  This patch replaces
the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
for visibility improvement changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Steffen Maier
5c750d58e9 scsi: zfcp: workqueue: set description for port work items with their WWPN as context
As a prerequisite, complement commit 3d1cb2059d ("workqueue: include
workqueue info when printing debug dump of a worker task") to be usable with
kernel modules by exporting the symbol set_worker_desc().  Current built-in
user was introduced with commit ef3b101925 ("writeback: set worker desc to
identify writeback workers in task dumps").

Can help distinguishing work items which do not have adapter scope.
Description is printed out with task dump for debugging on WARN, BUG, panic,
or magic-sysrq [show-task-states(t)].

Example:
$ echo 0 >| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed &
$ echo 't' >| /proc/sysrq-trigger
$ dmesg
sysrq: SysRq : Show State
  task                        PC stack   pid father
...
zfcp_q_0.0.1880 S14640  2165      2 0x02000000
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<0000000000168654>] rescuer_thread+0x33c/0x3a0
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
no locks held by zfcp_q_0.0.1880/2165.
...
kworker/u512:2  D11280  2193      2 0x02000000
Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327)
                                                        ^^^^^^^^^^^^^^^^^^^^^
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<00000000009e50c0>] schedule_timeout+0x488/0x4d0
 [<00000000001e425c>] msleep+0x5c/0x78                  >>test code only<<
 [<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp]
 [<0000000000167154>] process_one_work+0x3b4/0x718
 [<000000000016771c>] worker_thread+0x264/0x408
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u512:2/2193:
 #0:  (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
 #1:  ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
...

=============================================
Showing busy workqueues and worker pools:
workqueue zfcp_q_0.0.1880: flags=0x2000a
  pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1
    in-flight: 2193:zfcp_scsi_rport_work [zfcp]
pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311

Work items with adapter scope are already identified by the workqueue name
"zfcp_q_<devbusid>" and the work item function name.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:27:21 -04:00
Linus Torvalds
d92cd810e6 Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "rcu_work addition and a couple trivial changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove the comment about the old manager_arb mutex
  workqueue: fix the comments of nr_idle
  fs/aio: Use rcu_work instead of explicit rcu and work item
  cgroup: Use rcu_work instead of explicit rcu and work item
  RCU, workqueue: Implement rcu_work
2018-04-03 18:00:13 -07:00
Lai Jiangshan
f75da8a8a9 workqueue: remove the comment about the old manager_arb mutex
The manager_arb mutex doesn't exist any more.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-20 13:01:45 -07:00
Lai Jiangshan
5826cc8f5a workqueue: fix the comments of nr_idle
Since the worker rebinding behavior was refactored, there is
no idle worker off the idle_list now. The comment is outdated
and can be just removed.

It also groups nr_workers and nr_idle together.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-20 13:01:36 -07:00
Ingo Molnar
10c18c44a6 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20 08:08:02 +01:00