Commit Graph

753373 Commits

Author SHA1 Message Date
Mathieu Malaterre 66448bc274 workqueue: move function definitions within CONFIG_SMP block
In commit 7ee681b252 ("workqueue: Convert to state machine callbacks"),
three new function definitions were added: ‘workqueue_prepare_cpu’,
‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’.

Move these function definitions within a CONFIG_SMP block since they are
not used outside of it. This will match function declarations in header
<include/linux/workqueue.h>, and silence the following gcc warning (W=1):

  kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes]
  kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes]
  kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-23 11:16:58 -07:00
Tejun Heo 197f6accac workqueue: Make sure struct worker is accessible for wq_worker_comm()
The worker struct could already be freed when wq_worker_comm() tries
to access it for reporting.  This patch protects PF_WQ_WORKER
modifications with wq_pool_attach_mutex and makes wq_worker_comm()
test the flag before dereferencing worker from kthread_data(), which
ensures that it only dereferences when the worker struct is valid.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 6b59808bfe ("workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}")
2018-05-21 08:04:35 -07:00
Tejun Heo 6b59808bfe workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

  # ps -ef | grep kworker
  root           4       2  0 Feb25 ?        00:00:00 [kworker/0:0H]
  root           6       2  0 Feb25 ?        00:00:00 [kworker/u112:0]
  root          19       2  0 Feb25 ?        00:00:00 [kworker/1:0H]
  root          25       2  0 Feb25 ?        00:00:00 [kworker/2:0H]
  root          31       2  0 Feb25 ?        00:00:00 [kworker/3:0H]
  ...

This patch makes workqueue workers report the latest workqueue it was
executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

  # cat /proc/25/comm
  kworker/2:0-events_power_efficient
  # cat /proc/25/stat
  25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
  # grep Name /proc/25/status
  Name:   kworker/2:0-events_power_efficient

Unfortunately, ps(1) truncates comm to 15 characters,

  # ps 25
    PID TTY      STAT   TIME COMMAND
     25 ?        I      0:00 [kworker/2:0-eve]

making it a lot less useful; however, this should be an easy fix from
ps(1) side.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Craig Small <csmall@enc.com.au>
2018-05-18 08:47:13 -07:00
Tejun Heo 88b72b31e1 proc: Consolidate task->comm formatting into proc_task_name()
proc shows task->comm in three places - comm, stat, status - and each
is fetching and formatting task->comm slighly differently.  This patch
renames task_name() to proc_task_name(), makes it more generic, and
updates all three paths to use it.

This will enable expanding comm reporting for workqueue workers.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Tejun Heo 8bf895931e workqueue: Set worker->desc to workqueue name by default
Work functions can use set_worker_desc() to improve the visibility of
what the worker task is doing.  Currently, the desc field is unset at
the beginning of each execution and there is a separate field to track
the field is set during the current execution.

Instead of leaving empty till desc is set, worker->desc can be used to
remember the last workqueue the worker worked on by default and users
that use set_worker_desc() can override it to something more
informative as necessary.

This simplifies desc handling and helps tracking the last workqueue
that the worker exected on to improve visibility.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Tejun Heo a2d812a27a workqueue: Make worker_attach/detach_pool() update worker->pool
For historical reasons, the worker attach/detach functions don't
currently manage worker->pool and the callers are manually and
inconsistently updating it.

This patch moves worker->pool updates into the worker attach/detach
functions.  This makes worker->pool consistent and clearly defines how
worker->pool updates are synchronized.

This will help later workqueue visibility improvements by allowing
safe access to workqueue information from worker->task.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Tejun Heo 1258fae73c workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
To improve workqueue visibility, we want to be able to access
workqueue information from worker tasks.  The per-pool attach mutex
makes that difficult because there's no way of stabilizing task ->
worker pool association without knowing the pool first.

Worker attach/detach is a slow path and there's no need for different
pools to be able to perform them concurrently.  This patch replaces
the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
for visibility improvement changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-18 08:47:13 -07:00
Linus Torvalds e6506eb241 Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "Some of the ftrace internal events use a zero for a data size of a
  field event. This is increasingly important for the histogram trigger
  work that is being extended.

  While auditing trace events, I found that a couple of the xen events
  were used as just marking that a function was called, by creating a
  static array of size zero. This can play havoc with the tracing
  features if these events are used, because a zero size of a static
  array is denoted as a special nul terminated dynamic array (this is
  what the trace_marker code uses). But since the xen events have no
  size, they are not nul terminated, and unexpected results may occur.

  As trace events were never intended on being a marker to denote that a
  function was hit or not, especially since function tracing and kprobes
  can trivially do the same, the best course of action is to simply
  remove these events"

* tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
2018-05-16 16:45:23 -07:00
Linus Torvalds 9d38cd06c3 Merge tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull memory barrier for from Steven Rostedt:
 "The memory barrier usage in updating the random ptr hash for %p in
  vsprintf is incorrect.

  Instead of adding the read memory barrier into vsprintf() which will
  cause a slight degradation to a commonly used function in the kernel
  just to solve a very unlikely race condition that can only happen at
  boot up, change the code from using a variable branch to a
  static_branch.

  Not only does this solve the race condition, it actually will improve
  the performance of vsprintf() by removing the conditional branch that
  is only needed at boot"

* tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  vsprintf: Replace memory barrier with static_key for random_ptr_key update
2018-05-16 11:02:54 -07:00
Steven Rostedt (VMware) 85f4f12d51 vsprintf: Replace memory barrier with static_key for random_ptr_key update
Reviewing Tobin's patches for getting pointers out early before
entropy has been established, I noticed that there's a lone smp_mb() in
the code. As with most lone memory barriers, this one appears to be
incorrectly used.

We currently basically have this:

	get_random_bytes(&ptr_key, sizeof(ptr_key));
	/*
	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
	 * after get_random_bytes() returns.
	 */
	smp_mb();
	WRITE_ONCE(have_filled_random_ptr_key, true);

And later we have:

	if (unlikely(!have_filled_random_ptr_key))
		return string(buf, end, "(ptrval)", spec);

/* Missing memory barrier here. */

	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);

As the CPU can perform speculative loads, we could have a situation
with the following:

	CPU0				CPU1
	----				----
				   load ptr_key = 0
   store ptr_key = random
   smp_mb()
   store have_filled_random_ptr_key

				   load have_filled_random_ptr_key = true

				    BAD BAD BAD! (you're so bad!)

Because nothing prevents CPU1 from loading ptr_key before loading
have_filled_random_ptr_key.

But this race is very unlikely, but we can't keep an incorrect smp_mb() in
place. Instead, replace the have_filled_random_ptr_key with a static_branch
not_filled_random_ptr_key, that is initialized to true and changed to false
when we get enough entropy. If the update happens in early boot, the
static_key is updated immediately, otherwise it will have to wait till
entropy is filled and this happens in an interrupt handler which can't
enable a static_key, as that requires a preemptible context. In that case, a
work_queue is used to enable it, as entropy already took too long to
establish in the first place waiting a little more shouldn't hurt anything.

The benefit of using the static key is that the unlikely branch in
vsprintf() now becomes a nop.

Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-16 09:01:41 -04:00
Linus Torvalds 21b9f1c7e3 Merge tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
 "Here's a set of patches that fix a number of bugs in the in-kernel AFS
  client, including:

   - Fix directory locking to not use individual page locks for
     directory reading/scanning but rather to use a semaphore on the
     afs_vnode struct as the directory contents must be read in a single
     blob and data from different reads must not be mixed as the entire
     contents may be shuffled about between reads.

   - Fix address list parsing to handle port specifiers correctly.

   - Only give up callback records on a server if we actually talked to
     that server (we might not be able to access a server).

   - Fix some callback handling bugs, including refcounting,
     whole-volume callbacks and when callbacks actually get broken in
     response to a CB.CallBack op.

   - Fix some server/address rotation bugs, including giving up if we
     can't probe a server; giving up if a server says it doesn't have a
     volume, but there are more servers to try.

   - Fix the decoding of fetched statuses to be OpenAFS compatible.

   - Fix the handling of server lookups in Cache Manager ops (such as
     CB.InitCallBackState3) to use a UUID if possible and to handle no
     server being found.

   - Fix a bug in server lookup where not all addresses are compared.

   - Fix the non-encryption of calls that prevents some servers from
     being accessed (this also requires an AF_RXRPC patch that has
     already gone in through the net tree).

  There's also a patch that adds tracepoints to log Cache Manager ops
  that don't find a matching server, either by UUID or by address"

* tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix the non-encryption of calls
  afs: Fix CB.CallBack handling
  afs: Fix whole-volume callback handling
  afs: Fix afs_find_server search loop
  afs: Fix the handling of an unfound server in CM operations
  afs: Add a tracepoint to record callbacks from unlisted servers
  afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
  afs: Fix VNOVOL handling in address rotation
  afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
  afs: Fix server rotation's handling of fileserver probe failure
  afs: Fix refcounting in callback registration
  afs: Fix giving up callbacks on server destruction
  afs: Fix address list parsing
  afs: Fix directory page locking
2018-05-15 10:48:36 -07:00
Linus Torvalds eeba2dfa6a Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Two small driver fixes: aacraid to fix an unknown IU type on task
  management functions which causes a firmware fault and vmw_pvscsi to
  change a return code to retry the operation instead of causing an
  immediate error"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: aacraid: Correct hba_send to include iu_type
  scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
2018-05-15 10:15:48 -07:00
Linus Torvalds ee4b65c2e8 Merge tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux
Pull drm fix from Dave Airlie:
 "This fixes the mmap regression reported to me on irc by an i686 kernel
  user today, he's tested the fix works, and I've audited all the drm
  drivers for the bad mmap usage and since we use the mmap offset as a
  lookup in a table we aren't inclined to have anything bad in there"

[ See commit be83bbf806 ("mmap: introduce sane default mmap limits")
  for details and the note on why the GPU drivers were expected to be a
  special case.    - Linus ]

* tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux:
  drm: set FMODE_UNSIGNED_OFFSET for drm files
2018-05-15 09:58:01 -07:00
Dave Airlie 76ef6b28ea drm: set FMODE_UNSIGNED_OFFSET for drm files
Since we have the ttm and gem vma managers using a subset
of the file address space for objects, and these start at
0x100000000 they will overflow the new mmap checks.

I've checked all the mmap routines I could see for any
bad behaviour but overall most people use GEM/TTM VMA
managers even the legacy drivers have a hashtable.

Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon)
Fixes: be83bbf806 (mmap: introduce sane default mmap limits)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-05-15 14:46:04 +10:00
Steven Rostedt (VMware) 45dd9b0666 tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.

Worse yet, the hack used was:

 __array(char, x, 0)

Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.

Nuke the trace events!

Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 95a7d76897 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-14 17:02:30 -04:00
David Howells 4776cab43f afs: Fix the non-encryption of calls
Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
with kAFS.  Set the AF_RXRPC security level to encrypt client calls to deal
with this.

Note that incoming service calls are set by the remote client and so aren't
affected by this.

This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
begun by the kernel.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:19 +01:00
David Howells 428edade4e afs: Fix CB.CallBack handling
The handling of CB.CallBack messages sent by the fileserver to the client
is broken in that they are currently being processed after the reply has
been transmitted.

This is not what the fileserver expects, however.  It holds up change
visibility until the reply comes so as to maintain cache coherency, and so
expects the client to have to refetch the state on the affected files.

Fix CB.CallBack handling to perform the callback break before sending the
reply.

The fileserver is free to hold up status fetches issued by other threads on
the same client that occur in reponse to the callback until any pending
changes have been committed.

Fixes: d001648ec7 ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:19 +01:00
David Howells 68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
Marc Dionne f9c1bba3d3 afs: Fix afs_find_server search loop
The code that looks up servers by addresses makes the assumption
that the list of addresses for a server is sorted.  It exits the
loop if it finds that the target address is larger than the
current candidate.  As the list is not currently sorted, this
can lead to a failure to find a matching server, which can cause
callbacks from that server to be ignored.

Remove the early exit case so that the complete list is searched.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells a86b06d1cc afs: Fix the handling of an unfound server in CM operations
If the client cache manager operations that need the server record
(CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
the server record, they abort the call from the file server with
RX_CALL_DEAD when they should return okay.

Fixes: c35eccb1f6 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells 3709a399c1 afs: Add a tracepoint to record callbacks from unlisted servers
Add a tracepoint to record callbacks from servers for which we don't have a
record.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells 001ab5a67e afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
Fix the handling of the CB.InitCallBackState3 service call to find the
record of a server that we're using by looking it up by the UUID passed as
the parameter rather than by its address (of which it might have many, and
which may change).

Fixes: c35eccb1f6 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells 3d9fa91161 afs: Fix VNOVOL handling in address rotation
If a volume location record lists multiple file servers for a volume, then
it's possible that due to a misconfiguration or a changing configuration
that one of the file servers doesn't know about it yet and will abort
VNOVOL.  Currently, the rotation algorithm will stop with EREMOTEIO.

Fix this by moving on to try the next server if VNOVOL is returned.  Once
all the servers have been tried and the record rechecked, the algorithm
will stop with EREMOTEIO or ENOMEDIUM.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells 684b0f68cf afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
whereby if an error occurs on one of the vnodes being queried, then the
errorCode field is set correctly in the corresponding status, but the
interfaceVersion field is left unset.

Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
the following cases when called from FS.InlineBulkStatus delivery:

 (1) If InterfaceVersion == 0 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is invalid.

 (2) If InterfaceVersion == 1 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is valid and can be
     	 parsed.

 (3) If InterfaceVersion is anything else then the status record is
     invalid.

Fixes: dd9fbcb8e1 ("afs: Rearrange status mapping")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells ec5a3b4b50 afs: Fix server rotation's handling of fileserver probe failure
The server rotation algorithm just gives up if it fails to probe a
fileserver.  Fix this by rotating to the next fileserver instead.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:26:44 +01:00