Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
Pull selinux updates from Paul Moore:
- Add LSM hooks, and SELinux access control hooks, for dnotify,
fanotify, and inotify watches. This has been discussed with both the
LSM and fs/notify folks and everybody is good with these new hooks.
- The LSM stacking changes missed a few calls to current_security() in
the SELinux code; we fix those and remove current_security() for
good.
- Improve our network object labeling cache so that we always return
the object's label, even when under memory pressure. Previously we
would return an error if we couldn't allocate a new cache entry, now
we always return the label even if we can't create a new cache entry
for it.
- Convert the sidtab atomic_t counter to a normal u32 with
READ/WRITE_ONCE() and memory barrier protection.
- A few patches to policydb.c to clean things up (remove forward
declarations, long lines, bad variable names, etc)
* tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm: remove current_security()
selinux: fix residual uses of current_security() for the SELinux blob
selinux: avoid atomic_t usage in sidtab
fanotify, inotify, dnotify, security: add security hook for fs notifications
selinux: always return a secid from the network caches if we find one
selinux: policydb - rename type_val_to_struct_array
selinux: policydb - fix some checkpatch.pl warnings
selinux: shuffle around policydb.c to get rid of forward declarations
Tracefs may release more information about the kernel than desirable, so
restrict it when the kernel is locked down in confidentiality mode by
preventing open().
(Fixed by Ben Hutchings to avoid a null dereference in
default_file_open())
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
Disallow opening of debugfs files that might be used to muck around when
the kernel is locked down as various drivers give raw access to hardware
through debugfs. Given the effort of auditing all 2000 or so files and
manually fixing each one as necessary, I've chosen to apply a heuristic
instead. The following changes are made:
(1) chmod and chown are disallowed on debugfs objects (though the root dir
can be modified by mount and remount, but I'm not worried about that).
(2) When the kernel is locked down, only files with the following criteria
are permitted to be opened:
- The file must have mode 00444
- The file must not have ioctl methods
- The file must not have mmap
(3) When the kernel is locked down, files may only be opened for reading.
Normal device interaction should be done through configfs, sysfs or a
miscdev, not debugfs.
Note that this makes it unnecessary to specifically lock down show_dsts(),
show_devs() and show_call() in the asus-wmi driver.
I would actually prefer to lock down all files by default and have the
the files unlocked by the creator. This is tricky to manage correctly,
though, as there are 19 creation functions and ~1600 call sites (some of
them in loops scanning tables).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Andy Shevchenko <andy.shevchenko@gmail.com>
cc: acpi4asus-user@lists.sourceforge.net
cc: platform-driver-x86@vger.kernel.org
cc: Matthew Garrett <mjg59@srcf.ucam.org>
cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
Signed-off-by: James Morris <jmorris@namei.org>
Disallow access to /proc/kcore when the kernel is locked down to prevent
access to cryptographic data. This is limited to lockdown
confidentiality mode and is still permitted in integrity mode.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <jmorris@namei.org>
The testmmiotrace module shouldn't be permitted when the kernel is locked
down as it can be used to arbitrarily read and write MMIO space. This is
a runtime check rather than buildtime in order to allow configurations
where the same kernel may be run in both locked down or permissive modes
depending on local policy.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: Thomas Gleixner <tglx@linutronix.de>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: Ingo Molnar <mingo@kernel.org>
cc: "H. Peter Anvin" <hpa@zytor.com>
cc: x86@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Lock down TIOCSSERIAL as that can be used to change the ioport and irq
settings on a serial port. This only appears to be an issue for the serial
drivers that use the core serial code. All other drivers seem to either
ignore attempts to change port/irq or give an error.
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-serial@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
custom_method effectively allows arbitrary access to system memory, making
it possible for an attacker to circumvent restrictions on module loading.
Disable it if the kernel is locked down.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: linux-acpi@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Writing to MSRs should not be allowed if the kernel is locked down, since
it could lead to execution of arbitrary code in kernel mode. Based on a
patch by Kees Cook.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
cc: x86@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
IO port access would permit users to gain access to PCI configuration
registers, which in turn (on a lot of hardware) give access to MMIO
register space. This would potentially permit root to trigger arbitrary
DMA, so lock it down by default.
This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and
KDDISABIO console ioctls.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: x86@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Any hardware that can potentially generate DMA has to be locked down in
order to avoid it being possible for an attacker to modify kernel code,
allowing them to circumvent disabled module loading or module signing.
Default to paranoid - in future we can potentially relax this for
sufficiently IOMMU-isolated devices.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: linux-pci@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
The kexec_load() syscall permits the loading and execution of arbitrary
code in ring 0, which is something that lock-down is meant to prevent. It
makes sense to disable kexec_load() in this situation.
This does not affect kexec_file_load() syscall which can check for a
signature on the image to be booted.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: kexec@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>
Allowing users to read and write to core kernel memory makes it possible
for the kernel to be subverted, avoiding module loading restrictions, and
also to steal cryptographic information.
Disallow /dev/mem and /dev/kmem from being opened this when the kernel has
been locked down to prevent this.
Also disallow /dev/port from being opened to prevent raw ioport access and
thus DMA from being used to accomplish the same thing.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: x86@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
If the kernel is locked down, require that all modules have valid
signatures that we can verify.
I have adjusted the errors generated:
(1) If there's no signature (ENODATA) or we can't check it (ENOPKG,
ENOKEY), then:
(a) If signatures are enforced then EKEYREJECTED is returned.
(b) If there's no signature or we can't check it, but the kernel is
locked down then EPERM is returned (this is then consistent with
other lockdown cases).
(2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails
the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we
return the error we got.
Note that the X.509 code doesn't check for key expiry as the RTC might not
be valid or might not have been transferred to the kernel's clock yet.
[Modified by Matthew Garrett to remove the IMA integration. This will
be replaced with integration with the IMA architecture policy
patchset.]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
While existing LSMs can be extended to handle lockdown policy,
distributions generally want to be able to apply a straightforward
static policy. This patch adds a simple LSM that can be configured to
reject either integrity or all lockdown queries, and can be configured
at runtime (through securityfs), boot time (via a kernel parameter) or
build time (via a kconfig option). Based on initial code by David
Howells.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add a mechanism to allow LSMs to make a policy decision around whether
kernel functionality that would allow tampering with or examining the
runtime state of the kernel should be permitted.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
The lockdown module is intended to allow for kernels to be locked down
early in boot - sufficiently early that we don't have the ability to
kmalloc() yet. Add support for early initialisation of some LSMs, and
then add them to the list of names when we do full initialisation later.
Early LSMs are initialised in link order and cannot be overridden via
boot parameters, and cannot make use of kmalloc() (since the allocator
isn't initialised yet).
(Fixed by Stephen Rothwell to include a stub to fix builds when
!CONFIG_SECURITY)
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
As of now, setting watches on filesystem objects has, at most, applied a
check for read access to the inode, and in the case of fanotify, requires
CAP_SYS_ADMIN. No specific security hook or permission check has been
provided to control the setting of watches. Using any of inotify, dnotify,
or fanotify, it is possible to observe, not only write-like operations, but
even read access to a file. Modeling the watch as being merely a read from
the file is insufficient for the needs of SELinux. This is due to the fact
that read access should not necessarily imply access to information about
when another process reads from a file. Furthermore, fanotify watches grant
more power to an application in the form of permission events. While
notification events are solely, unidirectional (i.e. they only pass
information to the receiving application), permission events are blocking.
Permission events make a request to the receiving application which will
then reply with a decision as to whether or not that action may be
completed. This causes the issue of the watching application having the
ability to exercise control over the triggering process. Without drawing a
distinction within the permission check, the ability to read would imply
the greater ability to control an application. Additionally, mount and
superblock watches apply to all files within the same mount or superblock.
Read access to one file should not necessarily imply the ability to watch
all files accessed within a given mount or superblock.
In order to solve these issues, a new LSM hook is implemented and has been
placed within the system calls for marking filesystem objects with inotify,
fanotify, and dnotify watches. These calls to the hook are placed at the
point at which the target path has been resolved and are provided with the
path struct, the mask of requested notification events, and the type of
object on which the mark is being set (inode, superblock, or mount). The
mask and obj_type have already been translated into common FS_* values
shared by the entirety of the fs notification infrastructure. The path
struct is passed rather than just the inode so that the mount is available,
particularly for mount watches. This also allows for use of the hook by
pathname-based security modules. However, since the hook is intended for
use even by inode based security modules, it is not placed under the
CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security
modules would need to enable all of the path hooks, even though they do not
use any of them.
This only provides a hook at the point of setting a watch, and presumes
that permission to set a particular watch implies the ability to receive
all notification about that object which match the mask. This is all that
is required for SELinux. If other security modules require additional hooks
or infrastructure to control delivery of notification, these can be added
by them. It does not make sense for us to propose hooks for which we have
no implementation. The understanding that all notifications received by the
requesting application are all strictly of a type for which the application
has been granted permission shows that this implementation is sufficient in
its coverage.
Security modules wishing to provide complete control over fanotify must
also implement a security_file_open hook that validates that the access
requested by the watching application is authorized. Fanotify has the issue
that it returns a file descriptor with the file mode specified during
fanotify_init() to the watching process on event. This is already covered
by the LSM security_file_open hook if the security module implements
checking of the requested file mode there. Otherwise, a watching process
can obtain escalated access to a file for which it has not been authorized.
The selinux_path_notify hook implementation works by adding five new file
permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm
(descriptions about which will follow), and one new filesystem permission:
watch (which is applied to superblock checks). The hook then decides which
subset of these permissions must be held by the requesting application
based on the contents of the provided mask and the obj_type. The
selinux_file_open hook already checks the requested file mode and therefore
ensures that a watching process cannot escalate its access through
fanotify.
The watch, watch_mount, and watch_sb permissions are the baseline
permissions for setting a watch on an object and each are a requirement for
any watch to be set on a file, mount, or superblock respectively. It should
be noted that having either of the other two permissions (watch_reads and
watch_with_perm) does not imply the watch, watch_mount, or watch_sb
permission. Superblock watches further require the filesystem watch
permission to the superblock. As there is no labeled object in view for
mounts, there is no specific check for mount watches beyond watch_mount to
the inode. Such a check could be added in the future, if a suitable labeled
object existed representing the mount.
The watch_reads permission is required to receive notifications from
read-exclusive events on filesystem objects. These events include accessing
a file for the purpose of reading and closing a file which has been opened
read-only. This distinction has been drawn in order to provide a direct
indication in the policy for this otherwise not obvious capability. Read
access to a file should not necessarily imply the ability to observe read
events on a file.
Finally, watch_with_perm only applies to fanotify masks since it is the
only way to set a mask which allows for the blocking, permission event.
This permission is needed for any watch which is of this type. Though
fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit
trust to root, which we do not do, and does not support least privilege.
Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Atomic policy updaters are not very useful as they cannot
usually perform the policy updates on their own. Since it
seems that there is no strict need for the atomicity,
switch to the blocking variant. While doing so, rename
the functions accordingly.
Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>