Add LSM hooks for use by the new mount API and filesystem context code.
This includes:
(1) Hooks to handle allocation, duplication and freeing of the security
record attached to a filesystem context.
(2) A hook to snoop source specifications. There may be multiple of these
if the filesystem supports it. They will to be local files/devices if
fs_context::source_is_dev is true and will be something else, possibly
remote server specifications, if false.
(3) A hook to snoop superblock configuration options in key[=val] form.
If the LSM decides it wants to handle it, it can suppress the option
being passed to the filesystem. Note that 'val' may include commas
and binary data with the fsopen patch.
(4) A hook to perform validation and allocation after the configuration
has been done but before the superblock is allocated and set up.
(5) A hook to transfer the security from the context to a newly created
superblock.
(6) A hook to rule on whether a path point can be used as a mountpoint.
These are intended to replace:
security_sb_copy_data
security_sb_kern_mount
security_sb_mount
security_sb_set_mnt_opts
security_sb_clone_mnt_opts
security_sb_parse_opts_str
[AV -- some of the methods being replaced are already gone, some of the
methods are not added for the lack of need]
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-security-module@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Because the new API passes in key,value parameters, match_token() cannot be
used with it. Instead, provide three new helpers to aid with parsing:
(1) fs_parse(). This takes a parameter and a simple static description of
all the parameters and maps the key name to an ID. It returns 1 on a
match, 0 on no match if unknowns should be ignored and some other
negative error code on a parse error.
The parameter description includes a list of key names to IDs, desired
parameter types and a list of enumeration name -> ID mappings.
[!] Note that for the moment I've required that the key->ID mapping
array is expected to be sorted and unterminated. The size of the
array is noted in the fsconfig_parser struct. This allows me to use
bsearch(), but I'm not sure any performance gain is worth the hassle
of requiring people to keep the array sorted.
The parameter type array is sized according to the number of parameter
IDs and is indexed directly. The optional enum mapping array is an
unterminated, unsorted list and the size goes into the fsconfig_parser
struct.
The function can do some additional things:
(a) If it's not ambiguous and no value is given, the prefix "no" on
a key name is permitted to indicate that the parameter should
be considered negatory.
(b) If the desired type is a single simple integer, it will perform
an appropriate conversion and store the result in a union in
the parse result.
(c) If the desired type is an enumeration, {key ID, name} will be
looked up in the enumeration list and the matching value will
be stored in the parse result union.
(d) Optionally generate an error if the key is unrecognised.
This is called something like:
enum rdt_param {
Opt_cdp,
Opt_cdpl2,
Opt_mba_mpbs,
nr__rdt_params
};
const struct fs_parameter_spec rdt_param_specs[nr__rdt_params] = {
[Opt_cdp] = { fs_param_is_bool },
[Opt_cdpl2] = { fs_param_is_bool },
[Opt_mba_mpbs] = { fs_param_is_bool },
};
const const char *const rdt_param_keys[nr__rdt_params] = {
[Opt_cdp] = "cdp",
[Opt_cdpl2] = "cdpl2",
[Opt_mba_mpbs] = "mba_mbps",
};
const struct fs_parameter_description rdt_parser = {
.name = "rdt",
.nr_params = nr__rdt_params,
.keys = rdt_param_keys,
.specs = rdt_param_specs,
.no_source = true,
};
int rdt_parse_param(struct fs_context *fc,
struct fs_parameter *param)
{
struct fs_parse_result parse;
struct rdt_fs_context *ctx = rdt_fc2context(fc);
int ret;
ret = fs_parse(fc, &rdt_parser, param, &parse);
if (ret < 0)
return ret;
switch (parse.key) {
case Opt_cdp:
ctx->enable_cdpl3 = true;
return 0;
case Opt_cdpl2:
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mpbs:
ctx->enable_mba_mbps = true;
return 0;
}
return -EINVAL;
}
(2) fs_lookup_param(). This takes a { dirfd, path, LOOKUP_EMPTY? } or
string value and performs an appropriate path lookup to convert it
into a path object, which it will then return.
If the desired type was a blockdev, the type of the looked up inode
will be checked to make sure it is one.
This can be used like:
enum foo_param {
Opt_source,
nr__foo_params
};
const struct fs_parameter_spec foo_param_specs[nr__foo_params] = {
[Opt_source] = { fs_param_is_blockdev },
};
const char *char foo_param_keys[nr__foo_params] = {
[Opt_source] = "source",
};
const struct constant_table foo_param_alt_keys[] = {
{ "device", Opt_source },
};
const struct fs_parameter_description foo_parser = {
.name = "foo",
.nr_params = nr__foo_params,
.nr_alt_keys = ARRAY_SIZE(foo_param_alt_keys),
.keys = foo_param_keys,
.alt_keys = foo_param_alt_keys,
.specs = foo_param_specs,
};
int foo_parse_param(struct fs_context *fc,
struct fs_parameter *param)
{
struct fs_parse_result parse;
struct foo_fs_context *ctx = foo_fc2context(fc);
int ret;
ret = fs_parse(fc, &foo_parser, param, &parse);
if (ret < 0)
return ret;
switch (parse.key) {
case Opt_source:
return fs_lookup_param(fc, &foo_parser, param,
&parse, &ctx->source);
default:
return -EINVAL;
}
}
(3) lookup_constant(). This takes a table of named constants and looks up
the given name within it. The table is expected to be sorted such
that bsearch() be used upon it.
Possibly I should require the table be terminated and just use a
for-loop to scan it instead of using bsearch() to reduce hassle.
Tables look something like:
static const struct constant_table bool_names[] = {
{ "0", false },
{ "1", true },
{ "false", false },
{ "no", false },
{ "true", true },
{ "yes", true },
};
and a lookup is done with something like:
b = lookup_constant(bool_names, param->string, -1);
Additionally, optional validation routines for the parameter description
are provided that can be enabled at compile time. A later patch will
invoke these when a filesystem is registered.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduce a set of logging functions through which informational messages,
warnings and error messages incurred by the mount procedure can be logged
and, in a future patch, passed to userspace instead by way of the
filesystem configuration context file descriptor.
There are four functions:
(1) infof(const char *fmt, ...);
Logs an informational message.
(2) warnf(const char *fmt, ...);
Logs a warning message.
(3) errorf(const char *fmt, ...);
Logs an error message.
(4) invalf(const char *fmt, ...);
As errof(), but returns -EINVAL so can be used on a return statement.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is an eventual replacement for vfs_submount() uses. Unlike the
"mount" and "remount" cases, the users of that thing are not in VFS -
they are buried in various ->d_automount() instances and rather than
converting them all at once we introduce the (thankfully small and
simple) infrastructure here and deal with the prospective users in
afs, nfs, etc. parts of the series.
Here we just introduce a new constructor (fs_context_for_submount())
along with the corresponding enum constant to be put into fc->purpose
for those.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace do_remount_sb() with a function, reconfigure_super(), that's
fs_context aware. The fs_context is expected to be parameterised already
and have ->root pointing to the superblock to be reconfigured.
A legacy wrapper is provided that is intended to be called from the
fs_context ops when those appear, but for now is called directly from
reconfigure_super(). This wrapper invokes the ->remount_fs() superblock op
for the moment. It is intended that the remount_fs() op will be phased
out.
The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
that the context is being used for reconfiguration.
do_umount_root() is provided to consolidate remount-to-R/O for umount and
emergency remount by creating a context and invoking reconfiguration.
do_remount(), do_umount() and do_emergency_remount_callback() are switched
to use the new process.
[AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
umount / bug, gets rid of pointless complexity]
[AV -- set ->net_ns in all cases; nfs remount will need that]
[AV -- shift security_sb_remount() call into reconfigure_super(); the callers
that didn't do security_sb_remount() have NULL fc->security anyway, so it's
a no-op for them]
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
Doing it that way is both clumsy and imprecise.
Consider the callers' tree of vfs_get_tree():
vfs_get_tree()
<- do_new_mount()
<- vfs_kern_mount()
<- simple_pin_fs()
<- vfs_submount()
<- kern_mount_data()
<- init_mount_tree()
<- btrfs_mount()
<- vfs_get_tree()
<- nfs_do_root_mount()
<- nfs4_try_mount()
<- nfs_fs_mount()
<- vfs_get_tree()
<- nfs4_referral_mount()
do_new_mount() always does need MAC (we are guaranteed that neither
MS_KERNMOUNT nor MS_SUBMOUNT will be passed there).
simple_pin_fs(), vfs_submount() and kern_mount_data() pass explicit
flags inhibiting that check. So does nfs4_referral_mount() (the
flags there are ulimately coming from vfs_submount()).
init_mount_tree() is called too early for anything LSM-related; it
doesn't matter whether we attempt those checks, they'll do nothing.
Finally, in case of btrfs_mount() and nfs_fs_mount(), doing MAC
is pointless - either the caller will do it, or the flags are
such that we wouldn't have done it either.
In other words, the one and only case when we want that check
done is when we are called from do_new_mount(), and there we
want it unconditionally.
So let's simply move it there. The superblock is still locked,
so nobody is going to get access to it (via ustat(2), etc.)
until we get a chance to apply the checks - we are free to
move them to any point up to where we drop ->s_umount (in
do_new_mount_fc()).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Create an fs_context-aware version of do_new_mount(). This takes an
fs_context with a superblock already attached to it.
Make do_new_mount() use do_new_mount_fc() rather than do_new_mount(); this
allows the consolidation of the mount creation, check and add steps.
To make this work, mount_too_revealing() is changed to take a superblock
rather than a mount (which the fs_context doesn't have available), allowing
this check to be done before the mount object is created.
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Roll the handling of subtypes into do_new_mount() and vfs_get_tree(). The
former determines any subtype string and hangs it off the fs_context; the
latter applies it.
Make do_new_mount() create, parameterise and commit an fs_context and
create a mount for itself rather than calling vfs_kern_mount().
[AV -- missing kstrdup()]
[AV -- ... and no kstrdup() if we get to setting ->s_submount - we
simply transfer it from fc, leaving NULL behind]
[AV -- constify ->s_submount, while we are at it]
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Create a new helper, vfs_create_mount(), that creates a detached vfsmount
object from an fs_context that has a superblock attached to it.
Almost all uses will be paired with immediately preceding vfs_get_tree();
add a helper for such combination.
Switch vfs_kern_mount() to use this.
NOTE: mild behaviour change; passing NULL as 'device name' to
something like procfs will change /proc/*/mountstats - "device none"
instead on "no device". That is consistent with /proc/mounts et.al.
[do'h - EXPORT_SYMBOL_GPL slipped in by mistake; removed]
[AV -- remove confused comment from vfs_create_mount()]
[AV -- removed the second argument]
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduce a filesystem context concept to be used during superblock
creation for mount and superblock reconfiguration for remount. This is
allocated at the beginning of the mount procedure and into it is placed:
(1) Filesystem type.
(2) Namespaces.
(3) Source/Device names (there may be multiple).
(4) Superblock flags (SB_*).
(5) Security details.
(6) Filesystem-specific data, as set by the mount options.
Accessor functions are then provided to set up a context, parameterise it
from monolithic mount data (the data page passed to mount(2)) and tear it
down again.
A legacy wrapper is provided that implements what will be the basic
operations, wrapping access to filesystems that aren't yet aware of the
fs_context.
Finally, vfs_kern_mount() is changed to make use of the fs_context and
mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
[AV -- add missing kstrdup()]
[AV -- put_cred() can be unconditional - fc->cred can't be NULL]
[AV -- take legacy_validate() contents into legacy_parse_monolithic()]
[AV -- merge KERNEL_MOUNT and USER_MOUNT]
[AV -- don't unlock superblock on success return from vfs_get_tree()]
[AV -- kill 'reference' argument of init_fs_context()]
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mount_subtree() creates (and soon destroys) a temporary namespace,
so that automounts could function normally. These beasts should
never become anyone's current namespaces; they don't, but it would
be better to make prevention of that more straightforward. And
since they don't become anyone's current namespace, we don't need
to bother with reserving procfs inums for those.
Teach alloc_mnt_ns() to skip inum allocation if told so, adjust
put_mnt_ns() accordingly, make mount_subtree() use temporary
(anon) namespace. is_anon_ns() checks if a namespace is such.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make the reference from superblock to cgroup_root counting -
do cgroup_put() in cgroup_kill_sb() whether we'd done
percpu_ref_kill() or not; matching grab is done when we allocate
a new root. That gives the same refcounting rules for all callers
of cgroup_do_mount() - a reference to cgroup_root has been grabbed
by caller and it either is transferred to new superblock or dropped.
* have cgroup_kill_sb() treat an already killed refcount as "just
don't bother killing it, then".
* after successful cgroup_do_mount() have cgroup1_mount() recheck
if we'd raced with mount/umount from somebody else and cgroup_root
got killed. In that case we drop the superblock and bugger off
with -ERESTARTSYS, same as if we'd found it in the list already
dying.
* don't bother with delayed initialization of refcount - it's
unreliable and not needed. No need to prevent attempts to bump
the refcount if we find cgroup_root of another mount in progress -
sget will reuse an existing superblock just fine and if the
other sb manages to die before we get there, we'll catch
that immediately after cgroup_do_mount().
* don't bother with kernfs_pin_sb() - no need for doing that
either.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
same story as with last May fixes in sysfs (7b745a4e40
"unfuck sysfs_mount()"); new_sb is left uninitialized
in case of early errors in kernfs_mount_ns() and papering
over it by treating any error from kernfs_mount_ns() as
equivalent to !new_ns ends up conflating the cases when
objects had never been transferred to a superblock with
ones when that has happened and resulting new superblock
had been dropped. Easily fixed (same way as in sysfs
case). Additionally, there's a superblock leak on
kernfs_node_dentry() failure *and* a dentry leak inside
kernfs_node_dentry() itself - the latter on probably
impossible errors, but the former not impossible to trigger
(as the matter of fact, injecting allocation failures
at that point *does* trigger it).
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
UNAME26 is a mechanism to report Linux's version as 2.6.x, for
compatibility with old/broken software. Due to the way it is
implemented, it would have to be updated after 5.0, to keep the
resulting versions unique. Linus Torvalds argued:
"Do we actually need this?
I'd rather let it bitrot, and just let it return random versions. It
will just start again at 2.4.60, won't it?
Anybody who uses UNAME26 for a 5.x kernel might as well think it's
still 4.x. The user space is so old that it can't possibly care about
differences between 4.x and 5.x, can it?
The only thing that matters is that it shows "2.4.<largeenough>",
which it will do regardless"
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM SoC fixes from Olof Johansson:
"A bigger batch than I anticipated this week, for two reasons:
- Some fallout on Davinci from board file -> DTB conversion, that
also includes a few longer-standing fixes (i.e. not recent
regressions).
- drivers/reset material that has been in linux-next for a while, but
didn't get sent to us until now for a variety of reasons
(maintainer out sick, holidays, etc). There's a functional
dependency in there such that one platform (Altera's SoCFPGA) won't
boot without one of the patches; instead of reverting the patch
that got merged, I looked at this set and decided it was small
enough that I'll pick it up anyway. If you disagree I can revisit
with a smaller set.
That being said, there's also a handful of the usual stuff:
- Fix for a crash on Armada 7K/8K when the kernel touches
PSCI-reserved memory
- Fix for PCIe reset on Macchiatobin (Armada 8K development board,
what this email is sent from in fact :)
- Enable a few new-merged modules for Amlogic in arm64 defconfig
- Error path fixes on Integrator
- Build fix for Renesas and Qualcomm
- Initialization fix for Renesas RZ/G2E
.. plus a few more fixlets"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
ARM: integrator: impd1: use struct_size() in devm_kzalloc()
qcom-scm: Include <linux/err.h> header
gpio: pl061: handle failed allocations
ARM: dts: kirkwood: Fix polarity of GPIO fan lines
arm64: dts: marvell: mcbin: fix PCIe reset signal
arm64: dts: marvell: armada-ap806: reserve PSCI area
ARM: dts: da850-lcdk: Correct the sound card name
ARM: dts: da850-lcdk: Correct the audio codec regulators
ARM: dts: da850-evm: Correct the sound card name
ARM: dts: da850-evm: Correct the audio codec regulators
ARM: davinci: omapl138-hawk: fix label names in GPIO lookup entries
ARM: davinci: dm644x-evm: fix label names in GPIO lookup entries
ARM: davinci: dm355-evm: fix label names in GPIO lookup entries
ARM: davinci: da850-evm: fix label names in GPIO lookup entries
ARM: davinci: da830-evm: fix label names in GPIO lookup entries
arm64: defconfig: enable modules for amlogic s400 sound card
reset: uniphier-glue: Add AHCI reset control support in glue layer
dt-bindings: reset: uniphier: Add AHCI core reset description
reset: uniphier-usb3: Rename to reset-uniphier-glue
dt-bindings: reset: uniphier: Replace the expression of USB3 with generic peripherals
...
Pull btrfs fixes from David Sterba:
- two regression fixes in clone/dedupe ioctls, the generic check
callback needs to lock extents properly and wait for io to avoid
problems with writeback and relocation
- fix deadlock when using free space tree due to block group creation
- a recently added check refuses a valid fileystem with seeding device,
make that work again with a quickfix, proper solution needs more
intrusive changes
* tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Use real device structure to verify dev extent
Btrfs: fix deadlock when using free space tree due to block group creation
Btrfs: fix race between reflink/dedupe and relocation
Btrfs: fix race between cloning range ending at eof and writeback
Pull driver core fixes from Greg KH:
"Here is one small sysfs change, and a documentation update for 5.0-rc2
The sysfs change moves from using BUG_ON to WARN_ON, as discussed in
an email thread on lkml while trying to track down another driver bug.
sysfs should not be crashing and preventing people from seeing where
they went wrong. Now it properly recovers and warns the developer.
The documentation update removes the use of BUS_ATTR() as the kernel
is moving away from this to use the specific BUS_ATTR_RW() and friends
instead. There are pending patches in all of the different subsystems
to remove the last users of this macro, but for now, don't advertise
it should be used anymore to keep new ones from being introduced.
Both have been in linux-next with no reported issues"
* tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: driver core: remove use of BUS_ATTR
sysfs: convert BUG_ON to WARN_ON
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for some reported issues.
One reverts a patch that was made to the rtl8723bs driver that turned
out to not be needed at all as it was a bug in clang. The others fix
up some reported issues in the rtl8188eu driver and update the
MAINTAINERS file to point to Larry for this driver so he can get the
bug reports easier.
All have been in linux-next with no reported issues"
* tag 'staging-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert "staging: rtl8723bs: Mark ACPI table declaration as used"
staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
MAINTAINERS: Add entry for staging driver r8188eu
Pull tty/serial fixes from Greg KH:
"Here are 2 tty and serial fixes for 5.0-rc2 that resolve some reported
issues.
The first is a simple serial driver fix for a regression that showed
up in 5.0-rc1. The second one resolves a number of reported issues
with the recent tty locking fixes that went into 5.0-rc1. Lots of
people have tested the second one and say it resolves their issues.
Both have been in linux-next with no reported issues"
* tag 'tty-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Don't hold ldisc lock in tty_reopen() if ldisc present
serial: lantiq: Do not swap register read/writes
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes and quirk updates for 5.0-rc2.
The majority here are some quirks for some storage devices to get them
to work properly. There's also a fix here to resolve the reported
issues with some audio devices that say they are UAC3 compliant, but
really are not.
And a fix up for the MAINTAINERS file to remove a dead url.
All have been in linux-next with no reported issues"
* tag 'usb-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: storage: Remove outdated URL from MAINTAINERS
USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
usbcore: Select only first configuration for non-UAC3 compliant devices
USB: storage: add quirk for SMI SM3350
USB: storage: don't insert sane sense for SPC3+ when bad sense specified
usb: cdc-acm: send ZLP for Telit 3G Intel based modems
Pull cifs fixes from Steve French:
"A set of cifs/smb3 fixes, 4 for stable, most from Pavel. His patches
fix an important set of crediting (flow control) problems, and also
two problems in cifs_writepages, ddressing some large i/o and also
compounding issues"
* tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
CIFS: Fix error paths in writeback code
CIFS: Move credit processing to mid callbacks for SMB3
CIFS: Fix credits calculation for cancelled requests
cifs: Fix potential OOB access of lock element array
cifs: Limit memory used by lock request calls to a page
cifs: move large array from stack to heap
CIFS: Do not hide EINTR after sending network packets
CIFS: Fix credit computation for compounded requests
CIFS: Do not set credits to 1 if the server didn't grant anything
CIFS: Fix adjustment of credits for MTU requests
cifs: Fix a tiny potential memory leak
cifs: Fix a debug message