Commit Graph

765 Commits

Author SHA1 Message Date
Ingo Molnar 6bfbaa51ed sched/headers, RCU: Move rcu_copy_process() from <linux/sched/task.h> to kernel/fork.c
Move rcu_copy_process() into kernel/fork.c, which is the only
user of this inline function.

This simplifies <linux/sched/task.h> to the level that <linux/sched.h>
does not have to be included in it anymore - which change is done
in a subsequent patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:43:46 +01:00
Ingo Molnar 32ef5517c2 sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.

Update all code that relies on these facilities.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00
Ingo Molnar 68db0cf106 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:36 +01:00
Ingo Molnar 299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar 03441a3482 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h>
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar 037741a6d4 sched/headers: Prepare for the removal of <linux/rtmutex.h> from <linux/sched.h>
Fix up missing #includes in other places that rely on sched.h doing that for them.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar 6a3827d750 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/numa_balancing.h>
We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:30 +01:00
Ingo Molnar 8703e8a465 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h>
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/user.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar f7ccbae45c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h>
We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/coredump.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar 6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar 4eb5aaa3af sched/headers: Prepare for new header dependencies before moving code to <linux/sched/autogroup.h>
We are going to split <linux/sched/autogroup.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/autogroup.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar 780de9dd27 sched/headers, cgroups: Remove the threadgroup_change_*() wrappery
threadgroup_change_begin()/end() is a pointless wrapper around
cgroup_threadgroup_change_begin()/end(), minus a might_sleep()
in the !CONFIG_CGROUPS=y case.

Remove the wrappery, move the might_sleep() (the down_read()
already has a might_sleep() check).

This debloats <linux/sched.h> a bit and simplifies this API.

Update all call sites.

No change in functionality.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:25 +01:00
Vegard Nossum 3fce371bfa mm: add new mmget() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_users);/mmget\(\1\);/'
  git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_users);/mmget\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-2-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Linus Torvalds f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Pavel Emelyanov 893e26e61d userfaultfd: non-cooperative: Add fork() event
When the mm with uffd-ed vmas fork()-s the respective vmas notify their
uffds with the event which contains a descriptor with new uffd.  This
new descriptor can then be used to get events from the child and
populate its mm with data.  Note, that there can be different uffd-s
controlling different vmas within one mm, so first we should collect all
those uffds (and ctx-s) in a list and then notify them all one by one
but only once per fork().

The context is created at fork() time but the descriptor, file struct
and anon inode object is created at event read time.  So some trickery
is added to the userfaultfd_ctx_read() to handle the ctx queues' locking
vs file creation.

Another thing worth noticing is that the task that fork()-s waits for
the uffd event to get processed WITHOUT the mmap sem.

[aarcange@redhat.com: build warning fix]
  Link: http://lkml.kernel.org/r/20161216144821.5183-10-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20161216144821.5183-9-aarcange@redhat.com
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:28 -08:00
Linus Torvalds 42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Linus Torvalds 828cad8ea0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this (fairly busy) cycle were:

   - There was a class of scheduler bugs related to forgetting to update
     the rq-clock timestamp which can cause weird and hard to debug
     problems, so there's a new debug facility for this: which uncovered
     a whole lot of bugs which convinced us that we want to keep the
     debug facility.

     (Peter Zijlstra, Matt Fleming)

   - Various cputime related updates: eliminate cputime and use u64
     nanoseconds directly, simplify and improve the arch interfaces,
     implement delayed accounting more widely, etc. - (Frederic
     Weisbecker)

   - Move code around for better structure plus cleanups (Ingo Molnar)

   - Move IO schedule accounting deeper into the scheduler plus related
     changes to improve the situation (Tejun Heo)

   - ... plus a round of sched/rt and sched/deadline fixes, plus other
     fixes, updats and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
  sched/core: Remove unlikely() annotation from sched_move_task()
  sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
  sched/topology: Split out scheduler topology code from core.c into topology.c
  sched/core: Remove unnecessary #include headers
  sched/rq_clock: Consolidate the ordering of the rq_clock methods
  delayacct: Include <uapi/linux/taskstats.h>
  sched/core: Clean up comments
  sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
  sched/clock: Add dummy clear_sched_clock_stable() stub function
  sched/cputime: Remove generic asm headers
  sched/cputime: Remove unused nsec_to_cputime()
  s390, sched/cputime: Remove unused cputime definitions
  powerpc, sched/cputime: Remove unused cputime definitions
  s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
  ia64, sched/cputime: Remove unused cputime definitions
  ia64: Convert vtime to use nsec units directly
  ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
  sched/cputime: Remove jiffies based cputime
  sched/cputime, vtime: Return nsecs instead of cputime_t to account
  sched/cputime: Complete nsec conversion of tick based accounting
  ...
2017-02-20 12:52:55 -08:00
Peter Zijlstra 95cb64c1fe fork: Fix task_struct alignment
Stupid bug that wrecked the alignment of task_struct and causes WARN()s
in the x86 FPU code on some platforms.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e274795ea7 ("locking/mutex: Fix mutex handoff")
Link: http://lkml.kernel.org/r/20170218142645.GH6500@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-20 11:22:37 +01:00
Pavel Tikhomirov 749860ce24 prctl: propagate has_child_subreaper flag to every descendant
If process forks some children when it has is_child_subreaper
flag enabled they will inherit has_child_subreaper flag - first
group, when is_child_subreaper is disabled forked children will
not inherit it - second group. So child-subreaper does not reparent
all his descendants when their parents die. Having these two
differently behaving groups can lead to confusion. Also it is
a problem for CRIU, as when we restore process tree we need to
somehow determine which descendants belong to which group and
much harder - to put them exactly to these group.

To simplify these we can add a propagation of has_child_subreaper
flag on PR_SET_CHILD_SUBREAPER, walking all descendants of child-
subreaper to setup has_child_subreaper flag.

In common cases when process like systemd first sets itself to
be a child-subreaper and only after that forks its services, we will
have zero-length list of descendants to walk. Testing with binary
subtree of 2^15 processes prctl took < 0.007 sec and has shown close
to linear dependency(~0.2 * n * usec) on lower numbers of processes.

Moreover, I doubt someone intentionaly pre-forks the children whitch
should reparent to init before becoming subreaper, because some our
ancestor migh have had is_child_subreaper flag while forking our
sub-tree and our childs will all inherit has_child_subreaper flag,
and we have no way to influence it. And only way to check if we have
no has_child_subreaper flag is to create some childs, kill them and
see where they will reparent to.

Using walk_process_tree helper to walk subtree, thanks to Oleg! Timing
seems to be the same.

Optimize:

a) When descendant already has has_child_subreaper flag all his subtree
has it too already.

* for a) to be true need to move has_child_subreaper inheritance under
the same tasklist_lock with adding task to its ->real_parent->children
as without it process can inherit zero has_child_subreaper, then we
set 1 to it's parent flag, check that parent has no more children, and
only after child with wrong flag is added to the tree.

* Also make these inheritance more clear by using real_parent instead of
current, as on clone(CLONE_PARENT) if current has is_child_subreaper
and real_parent has no is_child_subreaper or has_child_subreaper, child
will have has_child_subreaper flag set without actually having a
subreaper in it's ancestors.

b) When some descendant is child_reaper, it's subtree is in different
pidns from us(original child-subreaper) and processes from other pidns
will never reparent to us.

So we can skip their(a,b) subtree from walk.

v2: switch to walk_process_tree() general helper, move
has_child_subreaper inheritance
v3: remove csr_descendant leftover, change current to real_parent
in has_child_subreaper inheritance
v4: small commit message fix

Fixes: ebec18a6d3 ("prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-02-03 16:55:15 +13:00
Oleg Nesterov 0f1b92cbdd introduce the walk_process_tree() helper
Add the new helper to walk the process tree, the next patch adds a user.
Note that it visits the group leaders only, proc_visitor can do
for_each_thread itself or we can trivially extend walk_process_tree() to
do this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-02-03 16:34:41 +13:00
Frederic Weisbecker ebd7e7fc4b timers/posix-timers: Convert internals to use nsecs
Use the new nsec based cputime accessors as part of the whole cputime
conversion from cputime_t to nsecs.

Also convert posix-cpu-timers to use nsec based internal counters to
simplify it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-19-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:13:54 +01:00
Nicolas Pitre b18b6a9cef timers: Omit POSIX timer stuff from task_struct when disabled
When CONFIG_POSIX_TIMERS is disabled, it is preferable to remove related
structures from struct task_struct and struct signal_struct as they
won't contain anything useful and shouldn't be relied upon by mistake.
Code still referencing those structures is also disabled here.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-01-27 13:05:26 -08:00
Peter Zijlstra e274795ea7 locking/mutex: Fix mutex handoff
While reviewing the ww_mutex patches, I noticed that it was still
possible to (incorrectly) succeed for (incorrect) code like:

	mutex_lock(&a);
	mutex_lock(&a);

This was possible if the second mutex_lock() would block (as expected)
but then receive a spurious wakeup. At that point it would find itself
at the front of the queue, request a handoff and instantly claim
ownership and continue, since owner would point to itself.

Avoid this scenario and simplify the code by introducing a third low
bit to signal handoff pickup. So once we request handoff, unlock
clears the handoff bit and sets the pickup bit along with the new
owner.

This also removes the need for the .handoff argument to
__mutex_trylock(), since that becomes superfluous with PICKUP.

In order to guarantee enough low bits, ensure task_struct alignment is
at least L1_CACHE_BYTES (which seems a good ideal regardless).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9d659ae14b ("locking/mutex: Add lock handoff to avoid starvation")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:14:38 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds 412ac77a9d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "After a lot of discussion and work we have finally reachanged a basic
  understanding of what is necessary to make unprivileged mounts safe in
  the presence of EVM and IMA xattrs which the last commit in this
  series reflects. While technically it is a revert the comments it adds
  are important for people not getting confused in the future. Clearing
  up that confusion allows us to seriously work on unprivileged mounts
  of fuse in the next development cycle.

  The rest of the fixes in this set are in the intersection of user
  namespaces, ptrace, and exec. I started with the first fix which
  started a feedback cycle of finding additional issues during review
  and fixing them. Culiminating in a fix for a bug that has been present
  since at least Linux v1.0.

  Potentially these fixes were candidates for being merged during the rc
  cycle, and are certainly backport candidates but enough little things
  turned up during review and testing that I decided they should be
  handled as part of the normal development process just to be certain
  there were not any great surprises when it came time to backport some
  of these fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC"
  exec: Ensure mm->user_ns contains the execed files
  ptrace: Don't allow accessing an undumpable mm
  ptrace: Capture the ptracer's creds not PT_PTRACE_CAP
  mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
2016-12-14 14:09:48 -08:00