linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Eric W. Biederman	8c6cf9cc82	mnt: Modify fs_fully_visible to deal with locked ro nodev and atime Ignore an existing mount if the locked readonly, nodev or atime attributes are less permissive than the desired attributes of the new mount. On success ensure the new mount locks all of the same readonly, nodev and atime attributes as the old mount. The nosuid and noexec attributes are not checked here as this change is destined for stable and enforcing those attributes causes a regression in lxc and libvirt-lxc where those applications will not start and there are no known executables on sysfs or proc and no known way to create exectuables without code modifications Cc: stable@vger.kernel.org Fixes: `e51db73532` ("userns: Better restrictions on when proc and sysfs can be mounted") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-06-04 10:29:25 -05:00
Eric W. Biederman	1b852bceb0	mnt: Refactor the logic for mounting sysfs and proc in a user namespace Fresh mounts of proc and sysfs are a very special case that works very much like a bind mount. Unfortunately the current structure can not preserve the MNT_LOCK... mount flags. Therefore refactor the logic into a form that can be modified to preserve those lock bits. Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount of the filesystem be fully visible in the current mount namespace, before the filesystem may be mounted. Move the logic for calling fs_fully_visible from proc and sysfs into fs/namespace.c where it has greater access to mount namespace state. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-05-13 21:44:11 -05:00
Al Viro	294d71ff2f	new helper: __legitimize_mnt() same as legitimize_mnt(), except that it does not drop and regain rcu_read_lock; return values are 0 => grabbed a reference, we are fine 1 => failed, just go away -1 => failed, go away and mntput(bastard) when outside of rcu_read_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:14 -04:00
Eric W. Biederman	7e96c1b0e0	mnt: Fix fs_fully_visible to verify the root directory is visible This fixes a dumb bug in fs_fully_visible that allows proc or sys to be mounted if there is a bind mount of part of /proc/ or /sys/ visible. Cc: stable@vger.kernel.org Reported-by: Eric Windisch <ewindisch@docker.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-05-09 11:55:50 -05:00
Eric W. Biederman	e0c9c0afd2	mnt: Update detach_mounts to leave mounts connected Now that it is possible to lazily unmount an entire mount tree and leave the individual mounts connected to each other add a new flag UMOUNT_CONNECTED to umount_tree to force this behavior and use this flag in detach_mounts. This closes a bug where the deletion of a file or directory could trigger an unmount and reveal data under a mount point. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:39:57 -05:00
Eric W. Biederman	f53e579751	mnt: Fix the error check in __detach_mounts lookup_mountpoint can return either NULL or an error value. Update the test in __detach_mounts to test for an error value to avoid pathological cases causing a NULL pointer dereferences. The callers of __detach_mounts should prevent it from ever being called on an unlinked dentry but don't take any chances. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:39:56 -05:00
Eric W. Biederman	ce07d891a0	mnt: Honor MNT_LOCKED when detaching mounts Modify umount(MNT_DETACH) to keep mounts in the hash table that are locked to their parent mounts, when the parent is lazily unmounted. In mntput_no_expire detach the children from the hash table, depending on mnt_pin_kill in cleanup_mnt to decrement the mnt_count of the children. In __detach_mounts if there are any mounts that have been unmounted but still are on the list of mounts of a mountpoint, remove their children from the mount hash table and those children to the unmounted list so they won't linger potentially indefinitely waiting for their final mntput, now that the mounts serve no purpose. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:39:55 -05:00
Eric W. Biederman	6a46c5735c	mnt: Factor umount_mnt from umount_tree For future use factor out a function umount_mnt from umount_tree. This function unhashes a mount and remembers where the mount was mounted so that eventually when the code makes it to a sleeping context the mountpoint can be dput. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:39:54 -05:00
Eric W. Biederman	7bdb11de8e	mnt: Factor out unhash_mnt from detach_mnt and umount_tree Create a function unhash_mnt that contains the common code between detach_mnt and umount_tree, and use unhash_mnt in place of the common code. This add a unncessary list_del_init(mnt->mnt_child) into umount_tree but given that mnt_child is already empty this extra line is a noop. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:39:54 -05:00
Eric W. Biederman	cd4a40174b	mnt: Fail collect_mounts when applied to unmounted mounts The only users of collect_mounts are in audit_tree.c In audit_trim_trees and audit_add_tree_rule the path passed into collect_mounts is generated from kern_path passed an audit_tree pathname which is guaranteed to be an absolute path. In those cases collect_mounts is obviously intended to work on mounted paths and if a race results in paths that are unmounted when collect_mounts it is reasonable to fail early. The paths passed into audit_tag_tree don't have the absolute path check. But are used to play with fsnotify and otherwise interact with the audit_trees, so again operating only on mounted paths appears reasonable. Avoid having to worry about what happens when we try and audit unmounted filesystems by restricting collect_mounts to mounts that appear in the mount tree. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-09 11:38:31 -05:00
Eric W. Biederman	5d88457eb5	mnt: On an unmount propagate clearing of MNT_LOCKED A prerequisite of calling umount_tree is that the point where the tree is mounted at is valid to unmount. If we are propagating the effect of the unmount clear MNT_LOCKED in every instance where the same filesystem is mounted on the same mountpoint in the mount tree, as we know (by virtue of the fact that umount_tree was called) that it is safe to reveal what is at that mountpoint. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:19 -05:00
Eric W. Biederman	411a938b5a	mnt: Delay removal from the mount hash. - Modify __lookup_mnt_hash_last to ignore mounts that have MNT_UMOUNTED set. - Don't remove mounts from the mount hash table in propogate_umount - Don't remove mounts from the mount hash table in umount_tree before the entire list of mounts to be umounted is selected. - Remove mounts from the mount hash table as the last thing that happens in the case where a mount has a parent in umount_tree. Mounts without parents are not hashed (by definition). This paves the way for delaying removal from the mount hash table even farther and fixing the MNT_LOCKED vs MNT_DETACH issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:19 -05:00
Eric W. Biederman	590ce4bcbf	mnt: Add MNT_UMOUNT flag In some instances it is necessary to know if the the unmounting process has begun on a mount. Add MNT_UMOUNT to make that reliably testable. This fix gets used in fixing locked mounts in MNT_DETACH Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:18 -05:00
Eric W. Biederman	c003b26ff9	mnt: In umount_tree reuse mnt_list instead of mnt_hash umount_tree builds a list of mounts that need to be unmounted. Utilize mnt_list for this purpose instead of mnt_hash. This begins to allow keeping a mount on the mnt_hash after it is unmounted, which is necessary for a properly functioning MNT_LOCKED implementation. The fact that mnt_list is an ordinary list makding available list_move is nice bonus. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:18 -05:00
Eric W. Biederman	8318e667f1	mnt: Don't propagate umounts in __detach_mounts Invoking mount propagation from __detach_mounts is inefficient and wrong. It is inefficient because __detach_mounts already walks the list of mounts that where something needs to be done, and mount propagation walks some subset of those mounts again. It is actively wrong because if the dentry that is passed to __detach_mounts is not part of the path to a mount that mount should not be affected. change_mnt_propagation(p,MS_PRIVATE) modifies the mount propagation tree of a master mount so it's slaves are connected to another master if possible. Which means even removing a mount from the middle of a mount tree with __detach_mounts will not deprive any mount propagated mount events. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:17 -05:00
Eric W. Biederman	e819f15210	mnt: Improve the umount_tree flags - Remove the unneeded declaration from pnode.h - Mark umount_tree static as it has no callers outside of namespace.c - Define an enumeration of umount_tree's flags. - Pass umount_tree's flags in by name This removes the magic numbers 0, 1 and 2 making the code a little clearer and makes it possible for there to be lazy unmounts that don't propagate. Which is what __detach_mounts actually wants for example. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:17 -05:00
Eric W. Biederman	a3b3c5627c	mnt: Use hlist_move_list in namespace_unlock Small cleanup to make the code more readable and maintainable. Signed-off-by: Eric Biederman <ebiederm@xmission.com>	2015-04-02 20:33:53 -05:00
David Howells	e36cb0b89c	VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_(dentry) Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' \|') \|\| die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") \|\| die $coccifile; print($fd "$_\n") \|\| die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 \|\| die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-22 11:38:41 -05:00
Linus Torvalds	50652963ea	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc VFS updates from Al Viro: "This cycle a lot of stuff sits on topical branches, so I'll be sending more or less one pull request per branch. This is the first pile; more to follow in a few. In this one are several misc commits from early in the cycle (before I went for separate branches), plus the rework of mntput/dput ordering on umount, switching to use of fs_pin instead of convoluted games in namespace_unlock()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch the IO-triggering parts of umount to fs_pin new fs_pin killing logics allow attaching fs_pin to a group not associated with some superblock get rid of the second argument of acct_kill() take count and rcu_head out of fs_pin dcache: let the dentry count go down to zero without taking d_lock pull bumping refcount into ->kill() kill pin_put() mode_t whack-a-mole: chelsio file->f_path.dentry is pinned down for as long as the file is open... get rid of lustre_dump_dentry() gut proc_register() a bit kill d_validate() ncpfs: get rid of d_validate() nonsense selinuxfs: don't open-code d_genocide()	2015-02-17 14:56:45 -08:00
Andrzej Hajda	fcc139ae22	fs/namespace: convert devname allocation to kstrdup_const VFS frequently performs duplication of strings located in read-only memory section. Replacing kstrdup by kstrdup_const allows to avoid such operations. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mike Turquette <mturquette@linaro.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:36 -08:00
Al Viro	87b95ce096	switch the IO-triggering parts of umount to fs_pin Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-01-25 23:17:29 -05:00
Eric W. Biederman	c297abfdf1	mnt: Fix a memory stomp in umount While reviewing the code of umount_tree I realized that when we append to a preexisting unmounted list we do not change pprev of the former first item in the list. Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on the former first item of the list will stomp unmounted.first leaving it set to some random mount point which we are likely to free soon. This isn't likely to hit, but if it does I don't know how anyone could track it down. [ This happened because we don't have all the same operations for hlist's as we do for normal doubly-linked lists. In particular, list_splice() is easy on our standard doubly-linked lists, while hlist_splice() doesn't exist and needs both start/end entries of the hlist. And commit `38129a13e6` incorrectly open-coded that missing hlist_splice(). We should think about making these kinds of "mindless" conversions easier to get right by adding the missing hlist helpers - Linus ] Fixes: `38129a13e6` switch mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-18 11:22:02 -08:00
Linus Torvalds	87c31b39ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace related fixes from Eric Biederman: "As these are bug fixes almost all of thes changes are marked for backporting to stable. The first change (implicitly adding MNT_NODEV on remount) addresses a regression that was created when security issues with unprivileged remount were closed. I go on to update the remount test to make it easy to detect if this issue reoccurs. Then there are a handful of mount and umount related fixes. Then half of the changes deal with the a recently discovered design bug in the permission checks of gid_map. Unix since the beginning has allowed setting group permissions on files to less than the user and other permissions (aka ---rwx---rwx). As the unix permission checks stop as soon as a group matches, and setgroups allows setting groups that can not later be dropped, results in a situtation where it is possible to legitimately use a group to assign fewer privileges to a process. Which means dropping a group can increase a processes privileges. The fix I have adopted is that gid_map is now no longer writable without privilege unless the new file /proc/self/setgroups has been set to permanently disable setgroups. The bulk of user namespace using applications even the applications using applications using user namespaces without privilege remain unaffected by this change. Unfortunately this ix breaks a couple user space applications, that were relying on the problematic behavior (one of which was tools/selftests/mount/unprivileged-remount-test.c). To hopefully prevent needing a regression fix on top of my security fix I rounded folks who work with the container implementations mostly like to be affected and encouraged them to test the changes. > So far nothing broke on my libvirt-lxc test bed. :-) > Tested with openSUSE 13.2 and libvirt 1.2.9. > Tested-by: Richard Weinberger <richard@nod.at> > Tested on Fedora20 with libvirt 1.2.11, works fine. > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Unbreak the unprivileged remount tests userns; Correct the comment in map_write userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks mnt: Clear mnt_expire during pivot_root mnt: Carefully set CL_UNPRIVILEGED in clone_mnt mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. umount: Do not allow unmounting rootfs. umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount	2014-12-17 12:31:40 -08:00
Al Viro	e149ed2b80	take the targets of /proc//ns/ symlinks to separate fs New pseudo-filesystem: nsfs. Targets of /proc//ns/ live there now. It's not mountable (not even registered, so it's not in /proc/filesystems, etc.). Files on it are bindable - we explicitly permit that in do_loopback(). This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well. get_proc_ns() is a macro now (it's simply returning ->i_private; would have been an inline, if not for header ordering headache). proc_ns_inode() is an ex-parrot. The interface used in procfs is ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops). Dentries and inodes are never hashed; a non-counting reference to dentry is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path() if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details of that mechanism. As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt; it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets from ns_get_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:30:20 -05:00
Al Viro	f77c80142e	bury struct proc_ns in fs/proc a) make get_proc_ns() return a pointer to struct ns_common b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that is_mnt_ns_file() could get away with fewer dereferences. That way struct proc_ns becomes invisible outside of fs/proc/*.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-04 14:34:54 -05:00

1 2 3 4 5 ...

432 Commits