Commit Graph

635 Commits

Author SHA1 Message Date
Linus Torvalds 1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Linus Torvalds 94e877d0fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile two from Al Viro:

 - orangefs fix

 - series of fs/namei.c cleanups from me

 - VFS stuff coming from overlayfs tree

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: Use RCU for destroy_inode
  vfs: use helper for calling f_op->fsync()
  mm: use helper for calling f_op->mmap()
  vfs: use helpers for calling f_op->{read,write}_iter()
  vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
  vfs: extract common parts of {compat_,}do_readv_writev()
  vfs: wrap write f_ops with file_{start,end}_write()
  vfs: deny copy_file_range() for non regular files
  vfs: deny fallocate() on directory
  vfs: create vfs helper vfs_tmpfile()
  namei.c: split unlazy_walk()
  namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
  lookup_fast(): clean up the logics around the fallback to non-rcu mode
  namei: fold unlazy_link() into its sole caller
2017-03-02 15:20:00 -08:00
Al Viro 653a7746fa Merge remote-tracking branch 'ovl/for-viro' into for-linus
Overlayfs-related series from Miklos and Amir
2017-03-02 06:41:22 -05:00
Ingo Molnar eb61baf698 sched/headers: Move the wake-queue types and interfaces from sched.h into <linux/sched/wake_q.h>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar f719ff9bce sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>
But first update the code that uses these facilities with the
new header.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar 5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Ingo Molnar 8703e8a465 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h>
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/user.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar 3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar 84f001e157 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/wake_q.h>
We are going to split <linux/sched/wake_q.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/wake_q.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:26 +01:00
Davidlohr Bueso 95e91b831f ipc/shm: Fix shmat mmap nil-page protection
The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Luc Van Oostenryck eac0b1c368 ipc/mqueue: add missing sparse annotation
Link: http://lkml.kernel.org/r/20170128235704.45302-1-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Manfred Spraul 9de5ab8a2e ipc/sem: add hysteresis
sysv sem has two lock modes: One with per-semaphore locks, one lock mode
with a single global lock for the whole array.  When switching from the
per-semaphore locks to the global lock, all per-semaphore locks must be
scanned for ongoing operations.

The patch adds a hysteresis for switching from the global lock to the
per semaphore locks.  This reduces how often the per-semaphore locks
must be scanned.

Compared to the initial patch, this is a simplified solution: Setting
USE_GLOBAL_LOCK_HYSTERESIS to 1 restores the current behavior.

In theory, a workload with exactly 10 simple sops and then one complex
op now scales a bit worse, but this is pure theory: If there is
concurrency, the it won't be exactly 10:1:10:1:10:1:...  If there is no
concurrency, then there is no need for scalability.

Link: http://lkml.kernel.org/r/1476851896-3590-3-git-send-email-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: <1vier1@web.de>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: <felixh@informatik.uni-bremen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Manfred Spraul 27d7be1801 ipc/sem.c: avoid using spin_unlock_wait()
a) The ACQUIRE in spin_lock() applies to the read, not to the store, at
   least for powerpc.  This forces to add a smp_mb() into the fast path.

b) The memory barrier provided by spin_unlock_wait() is right now arch
   dependent.

Therefore: Use spin_lock()/spin_unlock() instead of spin_unlock_wait().

Advantage: faster single op semop calls(), observed +8.9% on x86.  (the
other solution would be arch dependencies in ipc/sem).

Disadvantage: slower complex op semop calls, if (and only if) there are
no sleeping operations.

The next patch adds hysteresis, this further reduces the probability
that the slow path is used.

Link: http://lkml.kernel.org/r/1476851896-3590-2-git-send-email-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: <1vier1@web.de>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: <felixh@informatik.uni-bremen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Mike Rapoport 897ab3e0c4 userfaultfd: non-cooperative: add event for memory unmaps
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
  Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
  Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Miklos Szeredi 0eb8af4916 vfs: use helper for calling f_op->fsync()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Miklos Szeredi f74ac01520 mm: use helper for calling f_op->mmap()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Manfred Spraul c626bc46ed ipc/sem.c: fix incorrect sem_lock pairing
Based on the syzcaller test case from dvyukov:

  https://gist.githubusercontent.com/dvyukov/d0e5efefe4d7d6daed829f5c3ca26a40/raw/08d0a261fe3c987bed04fbf267e08ba04bd533ea/gistfile1.txt

The slow (i.e.: failure to acquire) syscall exit from semtimedop()
incorrectly assumed that the the same lock is acquired as it was at the
initial syscall entry.

This is wrong:
 - thread A: single semop semop(), sleeps
 - thread B: multi semop semop(), sleeps
 - thread A: woken up by signal/timeout

With this sequence, the initial sem_lock() call locks the per-semaphore
spinlock, and it is unlocked with sem_unlock().  The call at the syscall
return locks the global spinlock.  Because locknum is not updated, the
following sem_unlock() call unlocks the per-semaphore spinlock, which is
actually not locked.

The fix is trivial: Use the return value from sem_lock.

Fixes: 370b262c89 ("ipc/sem: avoid idr tree lookup for interrupted semop")
Link: http://lkml.kernel.org/r/1482215645-22328-1-git-send-email-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Johanna Abrahamsson <johanna@mjao.org>
Tested-by: Johanna Abrahamsson <johanna@mjao.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Davidlohr Bueso 370b262c89 ipc/sem: avoid idr tree lookup for interrupted semop
We can avoid the idr tree lookup (albeit possibly avoiding
idr_find_fast()) when being awoken in EINTR, as the semid will not
change in this context while blocked.  Use the sma pointer directly and
take the sem_lock, then re-check for RMID races.  We continue to
re-check the queue.status with the lock held such that we can detect
situations where we where are dealing with a spurious wakeup but another
task that holds the sem_lock updated the queue.status while we were
spinning for it.  Once we take the lock it obviously won't change again.

Being the only caller, get rid of sem_obtain_lock() altogether.

Link: http://lkml.kernel.org/r/1478708774-28826-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso b5fa01a22e ipc/sem: simplify wait-wake loop
Instead of using the reverse goto, we can simplify the flow and make it
more language natural by just doing do-while instead.  One would hope
this is the standard way (or obviously just with a while bucle) that we
do wait/wakeup handling in the kernel.  The exact same logic is kept,
just more indented.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1478708774-28826-2-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso f150f02cfb ipc/sem: use proper list api for pending_list wakeups
... saves some LoC and looks cleaner than re-implementing the calls.

Link: http://lkml.kernel.org/r/1474225896-10066-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso 4663d3e8f2 ipc/sem: explicitly inline check_restart
The compiler already does this, but make it explicit.  This helper is
really small and also used in update_queue's main loop, which is O(N^2)
scanning.  Inline and avoid the function overhead.

Link: http://lkml.kernel.org/r/1474225896-10066-5-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso 4ce33ec2e4 ipc/sem: optimize perform_atomic_semop()
This is the main workhorse that deals with semop user calls such that
the waitforzero or semval update operations, on the set, can complete on
not as the sma currently stands.  Currently, the set is iterated twice
(setting semval, then backwards for the sempid value).  Slowpaths, and
particularly SEM_UNDO calls, must undo any altered sem when it is
detected that the caller must block or has errored-out.

With larger sets, there can occur situations where this involves a lot
of cycles and can obviously be a suboptimal use of cached resources in
shared memory.  Ie, discarding CPU caches that are also calling semop
and have the sembuf cached (and can complete), while the current lock
holder doing the semop will block, error, or does a waitforzero
operation.

This patch proposes still iterating the set twice, but the first scan is
read-only, and we perform the actual updates afterward, once we know
that the call will succeed.  In order to not suffer from the overhead of
dealing with sops that act on the same sem_num, such (rare) cases use
perform_atomic_semop_slow(), which is exactly what we have now.
Duplicates are detected before grabbing sem_lock, and uses simple a
32/64-bit hash array variable to based on the sem_num we are working on.

In addition add some comments to when we expect to the caller to block.

[akpm@linux-foundation.org: coding-style fixes]
[colin.king@canonical.com: ensure we left shift a ULL rather than a 32 bit integer]
  Link: http://lkml.kernel.org/r/20161028181129.7311-1-colin.king@canonical.com
Link: http://lkml.kernel.org/r/20160921194603.GB21438@linux-80c1.suse
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso 9ae949fa38 ipc/sem: rework task wakeups
Our sysv sems have been using the notion of lockless wakeups for a
while, ever since commit 0a2b9d4c79 ("ipc/sem.c: move wake_up_process
out of the spinlock section"), in order to reduce the sem_lock hold
times.  This in-house pending queue can be replaced by wake_q (just like
all the rest of ipc now), in that it provides the following advantages:

 o Simplifies and gets rid of unnecessary code.

 o We get rid of the IN_WAKEUP complexities. Given that wake_q_add()
   grabs reference to the task, if awoken due to an unrelated event,
   between the wake_q_add() and wake_up_q() window, we cannot race with
   sys_exit and the imminent call to wake_up_process().

 o By not spinning IN_WAKEUP, we no longer need to disable preemption.

In consequence, the wakeup paths (after schedule(), that is) must
acknowledge an external signal/event, as well spurious wakeup occurring
during the pending wakeup window.  Obviously no changes in semantics
that could be visible to the user.  The fastpath is _only_ for when we
know for sure that we were awoken due to a the waker's successful semop
call (queue.status is not -EINTR).

On a 48-core Haswell, running the ipcscale 'waitforzero' test, the
following is seen with increasing thread counts:

                               v4.8-rc5                v4.8-rc5
                                                        semopv2
Hmean    sembench-sem-2      574733.00 (  0.00%)   578322.00 (  0.62%)
Hmean    sembench-sem-8      811708.00 (  0.00%)   824689.00 (  1.59%)
Hmean    sembench-sem-12     842448.00 (  0.00%)   845409.00 (  0.35%)
Hmean    sembench-sem-21     933003.00 (  0.00%)   977748.00 (  4.80%)
Hmean    sembench-sem-48     935910.00 (  0.00%)  1004759.00 (  7.36%)
Hmean    sembench-sem-79     937186.00 (  0.00%)   983976.00 (  4.99%)
Hmean    sembench-sem-234    974256.00 (  0.00%)  1060294.00 (  8.83%)
Hmean    sembench-sem-265    975468.00 (  0.00%)  1016243.00 (  4.18%)
Hmean    sembench-sem-296    991280.00 (  0.00%)  1042659.00 (  5.18%)
Hmean    sembench-sem-327    975415.00 (  0.00%)  1029977.00 (  5.59%)
Hmean    sembench-sem-358   1014286.00 (  0.00%)  1049624.00 (  3.48%)
Hmean    sembench-sem-389    972939.00 (  0.00%)  1043127.00 (  7.21%)
Hmean    sembench-sem-420    981909.00 (  0.00%)  1056747.00 (  7.62%)
Hmean    sembench-sem-451    990139.00 (  0.00%)  1051609.00 (  6.21%)
Hmean    sembench-sem-482    965735.00 (  0.00%)  1040313.00 (  7.72%)

[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: merge fix for WAKE_Q to DEFINE_WAKE_Q rename]
  Link: http://lkml.kernel.org/r/20161122210410.5eca9fc2@canb.auug.org.au
Link: http://lkml.kernel.org/r/1474225896-10066-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Davidlohr Bueso 248e7357cf ipc/sem: do not call wake_sem_queue_do() prematurely ... as this call should obviously be paired with its _prepare()
counterpart.  At least whenever possible, as there is no harm in calling
it bogusly as we do now in a few places.  Immediate error semop(2) paths
that are far from ever having the task block can be simplified and avoid
a few unnecessary loads on their way out of the call as it is not deeply
nested.

Link: http://lkml.kernel.org/r/1474225896-10066-2-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00