Commit Graph

400 Commits

Author SHA1 Message Date
Andrew Morton
8f68fa2d19 ipc/util.c: use register_hotmemory_notifier()
Squishes a statement-with-no-effect warning, removes some ifdefs and
shrinks .text by one byte!

Note that this code fails to check for blocking_notifier_chain_register()
failures.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00
Stanislav Kinsbursky
2dc958fa2f ipc: set msg back to -EAGAIN if copy wasn't performed
Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found.  This
garantees that msg is either a error pointer or a copy address.

Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-02 10:09:01 -07:00
Linus Torvalds
2c3de1c2d7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric W Biederman:
 "The bulk of the changes are fixing the worst consequences of the user
  namespace design oversight in not considering what happens when one
  namespace starts off as a clone of another namespace, as happens with
  the mount namespace.

  The rest of the changes are just plain bug fixes.

  Many thanks to Andy Lutomirski for pointing out many of these issues."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Restrict when proc and sysfs can be mounted
  ipc: Restrict mounting the mqueue filesystem
  vfs: Carefully propogate mounts across user namespaces
  vfs: Add a mount flag to lock read only bind mounts
  userns:  Don't allow creation if the user is chrooted
  yama:  Better permission check for ptraceme
  pid: Handle the exit of a multi-threaded init.
  scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
2013-03-28 13:43:46 -07:00
Eric W. Biederman
a636b702ed ipc: Restrict mounting the mqueue filesystem
Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
rights over the ipc namespace.   The principle here is if you create
or have capabilities over it you can mount it, otherwise you get to live
with what other people have mounted.

This information is not particularly sensitive and mqueue essentially
only reports which posix messages queues exist.  Still when creating a
restricted environment for an application to live any extra
information may be of use to someone with sufficient creativity.  The
historical if imperfect way this information has been restricted has
been not to allow mounts and restricting this to ipc namespace
creators maintains the spirit of the historical restriction.

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27 07:50:06 -07:00
Vladimir Davydov
38d78e587d mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.

mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only.  Besides, on umount a warning would be
printed like this one:

  =====================================
  [ BUG: bad unlock balance detected! ]
  3.9.0-rc3 #5 Not tainted
  -------------------------------------
  a.out/12486 is trying to release lock (sb_writers) at:
  mnt_drop_write+0x1f/0x30
  but there are no more locks to release!

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-22 16:41:21 -07:00
Peter Hurley
88b9e456b1 ipc: don't allocate a copy larger than max
When MSG_COPY is set, a duplicate message must be allocated for the copy
before locking the queue.  However, the copy could not be larger than was
sent which is limited to msg_ctlmax.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08 15:05:33 -08:00
Peter Hurley
e1082f45f1 ipc: fix potential oops when src msg > 4k w/ MSG_COPY
If the src msg is > 4k, then dest->next points to the
next allocated segment; resetting it just prior to dereferencing
is bad.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08 15:05:33 -08:00
Tejun Heo
54924ea33f ipc: convert to idr_alloc()
Convert to the much saner new idr interface.

The new interface doesn't directly translate to the way idr_pre_get()
was used around ipc_addid() as preloading disables preemption.  From
my cursory reading, it seems like we should be able to do all
allocation from ipc_addid(), so I moved it there.  Can you please
check whether this would be okay?  If this is wrong and ipc_addid()
should be allowed to be called from non-sleepable context, I'd suggest
allocating id itself in the outer functions and later install the
pointer using idr_replace().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:19 -08:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Michel Lespinasse
41badc15cb mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE
multiple, however there was no equivalent code in mm_populate(), which
caused issues.

This could be fixed by introduced the same rounding in mm_populate(),
however I think it's preferable to make do_mmap_pgoff() return populate
as a size rather than as a boolean, so we don't have to duplicate the
size rounding logic in mm_populate().

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:11 -08:00
Michel Lespinasse
bebeb3d68b mm: introduce mm_populate() for populating new vmas
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas.  This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.

This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() /
mlockall().

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:10 -08:00
Anatol Pomozov
39b6525274 fs: Preserve error code in get_empty_filp(), part 2
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
 - not enough memory for file structures
 - operation is not allowed
 - user is over its limit

Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.

Return error through pointer. Change all callers to preserve this error code.

[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:32 -05:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Gao feng
bc1b69ed22 userns: Allow the unprivileged users to mount mqueue fs
This patch allow the unprivileged user to mount mqueuefs in
user ns.

If two userns share the same ipcns,the files in mqueue fs
should be seen in both these two userns.

If the userns has its own ipcns,it has its own mqueue fs too.
ipcns has already done this job well.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-01-27 19:25:50 -08:00
Stanislav Kinsbursky
3fcfe78658 ipc: add more comments to message copying related code
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:46 -08:00
Stanislav Kinsbursky
51eeacaa07 ipc: simplify message copying
Remove the redundant and confusing fill_copy().  Also add copy_msg()
check for error.  In this case exit from the function have to be done
instead of break, because further code interprets any error as EAGAIN.

Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
disabled.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:46 -08:00
Stanislav Kinsbursky
b30efe2775 ipc: convert prepare_copy() from macro to function
This code works if CONFIG_CHECKPOINT_RESTORE is disabled.

[akpm@linux-foundation.org: remove __maybe_unused]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:46 -08:00
Stanislav Kinsbursky
85398aa8de ipc: simplify free_copy() call
Passing and checking of msgflg to free_copy() is redundant.  This patch
sets copy to NULL on declaration instead and checks for non-NULL in
free_copy().

Note: in case of copy allocation failure, error is returned immediately.
So no need to check for IS_ERR() in free_copy().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Stanislav Kinsbursky
3a665531a3 selftests: IPC message queue copy feature test
This test can be used to check wheither kernel supports IPC message queue
copy and restore features (required by CRIU project).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Stanislav Kinsbursky
4a674f34ba ipc: introduce message queue copy feature
This patch is required for checkpoint/restore in userspace.

c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be
resumed, so queue have to be valid).

To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
was introduced.  If this flag was specified, then mtype is interpreted as
number of the message to copy.

If MSG_COPY is set, then kernel will allocate dummy message with passed
size, and then use new copy_msg() helper function to copy desired message
(instead of unlinking it from the queue).

Notes:

1) Return -ENOSYS if MSG_COPY is specified, but
   CONFIG_CHECKPOINT_RESTORE is not set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Stanislav Kinsbursky
f9dd87f473 ipc: message queue receive cleanup
Move all message related manipulation into one function msg_fill().
Actually, two functions because of the compat one.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Stanislav Kinsbursky
03f5956680 ipc: add sysctl to specify desired next object id
Add 3 new variables and sysctls to tune them (by one "next_id" variable
for messages, semaphores and shared memory respectively).  This variable
can be used to set desired id for next allocated IPC object.  By default
it's equal to -1 and old behaviour is preserved.  If this variable is
non-negative, then desired idr will be extracted from it and used as a
start value to search for free IDR slot.

Notes:

1) this patch doesn't guarantee that the new object will have desired
   id.  So it's up to user space how to handle new object with wrong id.

2) After a sucessful id allocation attempt, "next_id" will be set back
   to -1 (if it was non-negative).

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Stanislav Kinsbursky
9afdacda02 ipc: remove forced assignment of selected message
This is a cleanup patch. The assignment is redundant.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:45 -08:00
Linus Torvalds
a2faf2fc53 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull (again) user namespace infrastructure changes from Eric Biederman:
 "Those bugs, those darn embarrasing bugs just want don't want to get
  fixed.

  Linus I just updated my mirror of your kernel.org tree and it appears
  you successfully pulled everything except the last 4 commits that fix
  those embarrasing bugs.

  When you get a chance can you please repull my branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Fix typo in description of the limitation of userns_install
  userns: Add a more complete capability subset test to commit_creds
  userns: Require CAP_SYS_ADMIN for most uses of setns.
  Fix cap_capable to only allow owners in the parent user namespace to have caps.
2012-12-18 10:55:28 -08:00