When I first wrote binderfs the new mount api had not yet landed. Now
that it has been around for a little while and a bunch of filesystems
have already been ported we should do so too. When Al sent his
mount-api-conversion pr he requested that binderfs (and a few others) be
ported separately. It's time we port binderfs. We can make use of the
new option parser, get nicer infrastructure and it will be easier if we
ever add any new mount options.
This survives testing with the binderfs selftests:
for i in `seq 1 1000`; do ./binderfs_test; done
including the new stress tests I sent out for review today:
TAP version 13
1..1
# selftests: filesystems/binderfs: binderfs_test
# [==========] Running 3 tests from 1 test cases.
# [ RUN ] global.binderfs_stress
# [ XFAIL! ] Tests are not run as root. Skipping privileged tests
# [==========] Running 3 tests from 1 test cases.
# [ RUN ] global.binderfs_stress
# [ OK ] global.binderfs_stress
# [ RUN ] global.binderfs_test_privileged
# [ OK ] global.binderfs_test_privileged
# [ RUN ] global.binderfs_test_unprivileged
# # Allocated new binder device with major 243, minor 4, and name my-binder
# # Detected binder version: 8
# [==========] Running 3 tests from 1 test cases.
# [ RUN ] global.binderfs_stress
# [ OK ] global.binderfs_stress
# [ RUN ] global.binderfs_test_privileged
# [ OK ] global.binderfs_test_privileged
# [ RUN ] global.binderfs_test_unprivileged
# [ OK ] global.binderfs_test_unprivileged
# [==========] 3 / 3 tests passed.
# [ PASSED ]
ok 1 selftests: filesystems/binderfs: binderfs_test
Cc: Todd Kjos <tkjos@google.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200313153427.141789-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Binderfs binder-control devices are cleaned up via binderfs_evict_inode
too() which will use refcount_dec_and_test(). However, we missed to set
the refcount for binderfs binder-control devices and so we underflowed
when the binderfs instance got unmounted. Pretty obvious oversight and
should have been part of the more general UAF fix. The good news is that
having test cases (suprisingly) helps.
Technically, we could detect that we're about to cleanup the
binder-control dentry in binderfs_evict_inode() and then simply clean it
up. But that makes the assumption that the binder driver itself will
never make use of a binderfs binder-control device after the binderfs
instance it belongs to has been unmounted and the superblock for it been
destroyed. While it is unlikely to ever come to this let's be on the
safe side. Performance-wise this also really doesn't matter since the
binder-control device is only every really when creating the binderfs
filesystem or creating additional binder devices. Both operations are
pretty rare.
Fixes: f0fe2c0f05 ("binder: prevent UAF for binderfs devices II")
Link: https://lore.kernel.org/r/CA+G9fYusdfg7PMfC9Xce-xLT7NiyKSbgojpK35GOm=Pf9jXXrA@mail.gmail.com
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20200311105309.1742827-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a necessary follow up to the first fix I proposed and we merged
in 2669b8b0c7 ("binder: prevent UAF for binderfs devices"). I have been
overly optimistic that the simple fix I proposed would work. But alas,
ihold() + iput() won't work since the inodes won't survive the
destruction of the superblock.
So all we get with my prior fix is a different race with a tinier
race-window but it doesn't solve the issue. Fwiw, the problem lies with
generic_shutdown_super(). It even has this cozy Al-style comment:
if (!list_empty(&sb->s_inodes)) {
printk("VFS: Busy inodes after unmount of %s. "
"Self-destruct in 5 seconds. Have a nice day...\n",
sb->s_id);
}
On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is
called which punts the actual cleanup operation to a workqueue. At some
point, binder_deferred_func() will be called which will end up calling
binder_deferred_release() which will retrieve and cleanup the
binder_context attach to this struct binder_proc.
If we trace back where this binder_context is attached to binder_proc we
see that it is set in binder_open() and is taken from the struct
binder_device it is associated with. This obviously assumes that the
struct binder_device that context is attached to is _never_ freed. While
that might be true for devtmpfs binder devices it is most certainly
wrong for binderfs binder devices.
So, assume binder_open() is called on a binderfs binder devices. We now
stash away the struct binder_context associated with that struct
binder_devices:
proc->context = &binder_dev->context;
/* binderfs stashes devices in i_private */
if (is_binderfs_device(nodp)) {
binder_dev = nodp->i_private;
info = nodp->i_sb->s_fs_info;
binder_binderfs_dir_entry_proc = info->proc_log_dir;
} else {
.
.
.
proc->context = &binder_dev->context;
Now let's assume that the binderfs instance for that binder devices is
shutdown via umount() and/or the mount namespace associated with it goes
away. As long as there is still an fd open for that binderfs binder
device things are fine. But let's assume we now close the last fd for
that binderfs binder device. Now binder_release() is called and punts to
the workqueue. Assume that the workqueue has quite a bit of stuff to do
and doesn't get to cleaning up the struct binder_proc and the associated
struct binder_context with it for that binderfs binder device right
away. In the meantime, the VFS is killing the super block and is
ultimately calling sb->evict_inode() which means it will call
binderfs_evict_inode() which does:
static void binderfs_evict_inode(struct inode *inode)
{
struct binder_device *device = inode->i_private;
struct binderfs_info *info = BINDERFS_I(inode);
clear_inode(inode);
if (!S_ISCHR(inode->i_mode) || !device)
return;
mutex_lock(&binderfs_minors_mutex);
--info->device_count;
ida_free(&binderfs_minors, device->miscdev.minor);
mutex_unlock(&binderfs_minors_mutex);
kfree(device->context.name);
kfree(device);
}
thereby freeing the struct binder_device including struct
binder_context.
Now the workqueue finally has time to get around to cleaning up struct
binder_proc and is now trying to access the associate struct
binder_context. Since it's already freed it will OOPs.
Fix this by introducing a refounct on binder devices.
This is an alternative fix to 51d8a7eca6 ("binder: prevent UAF read in
print_binder_transaction_log_entry()").
Fixes: 3ad20fe393 ("binder: implement binderfs")
Fixes: 2669b8b0c7 ("binder: prevent UAF for binderfs devices")
Fixes: 03e2e07e38 ("binder: Make transaction_log available in binderfs")
Related : 51d8a7eca6 ("binder: prevent UAF read in print_binder_transaction_log_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20200303164340.670054-1-christian.brauner@ubuntu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently /sys/kernel/debug/binder/proc contains
the debug data for every binder_proc instance.
This patch makes this information also available
in a binderfs instance mounted with a mount option
"stats=global" in addition to debugfs. The patch does
not affect the presence of the file in debugfs.
If a binderfs instance is mounted at path /dev/binderfs,
this file would be present at /dev/binderfs/binder_logs/proc.
This change provides an alternate way to access this file when debugfs
is not mounted.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Link: https://lore.kernel.org/r/20190903161655.107408-5-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the binder transaction log files 'transaction_log'
and 'failed_transaction_log' live in debugfs at the following locations:
/sys/kernel/debug/binder/failed_transaction_log
/sys/kernel/debug/binder/transaction_log
This patch makes these files also available in a binderfs instance
mounted with the mount option "stats=global".
It does not affect the presence of these files in debugfs.
If a binderfs instance is mounted at path /dev/binderfs, the location of
these files will be as follows:
/dev/binderfs/binder_logs/failed_transaction_log
/dev/binderfs/binder_logs/transaction_log
This change provides an alternate option to access these files when
debugfs is not mounted.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Link: https://lore.kernel.org/r/20190903161655.107408-4-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following binder stat files currently live in debugfs.
/sys/kernel/debug/binder/state
/sys/kernel/debug/binder/stats
/sys/kernel/debug/binder/transactions
This patch makes these files available in a binderfs instance
mounted with the mount option 'stats=global'. For example, if a binderfs
instance is mounted at path /dev/binderfs, the above files will be
available at the following locations:
/dev/binderfs/binder_logs/state
/dev/binderfs/binder_logs/stats
/dev/binderfs/binder_logs/transactions
This provides a way to access them even when debugfs is not mounted.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20190903161655.107408-3-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, all binder state and statistics live in debugfs.
We need this information even when debugfs is not mounted.
This patch adds the mount option 'stats' to enable a binderfs
instance to have binder debug information present in the same.
'stats=global' will enable the global binder statistics. In
the future, 'stats=local' will enable binder statistics local
to the binderfs instance. The two modes 'global' and 'local'
will be mutually exclusive. 'stats=global' option is only available
for a binderfs instance mounted in the initial user namespace.
An attempt to use the option to mount a binderfs instance in
another user namespace will return an EPERM error.
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20190903161655.107408-2-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
binderfs should not have a separate device_initcall(). When a kernel is
compiled with CONFIG_ANDROID_BINDERFS register the filesystem alongside
CONFIG_ANDROID_IPC. This use-case is especially sensible when users specify
CONFIG_ANDROID_IPC=y, CONFIG_ANDROID_BINDERFS=y and
ANDROID_BINDER_DEVICES="".
When CONFIG_ANDROID_BINDERFS=n then this always succeeds so there's no
regression potential for legacy workloads.
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently adhere to the reserved devices limit when creating new
binderfs devices in binderfs instances not located in the inital ipc
namespace. But it is still possible to rob the host instances of their 4
reserved devices by creating the maximum allowed number of devices in a
single binderfs instance located in a non-initial ipc namespace and then
mounting 4 separate binderfs instances in non-initial ipc namespaces. That
happens because the limit is currently not respected for the creation of
the initial binder-control device node. Block this nonsense by performing
the same check in binderfs_binder_ctl_create() that we perform in
binderfs_binder_device_create().
Fixes: 36bdf3cae0 ("binderfs: reserve devices for initial mount")
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The binderfs_binder_ctl_create() call is a no-op on subsequent calls and
the first call is done before we unlock the suberblock. Hence, there is no
need to take inode_lock() in there. Let's remove it.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al pointed out that first calling kill_litter_super() before cleaning up
info is more correct since destroying info doesn't depend on the state of
the dentries and inodes. That the opposite remains true is not guaranteed.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- switch from d_alloc_name() + d_lookup() to lookup_one_len():
Instead of using d_alloc_name() and then doing a d_lookup() with the
allocated dentry to find whether a device with the name we're trying to
create already exists switch to using lookup_one_len(). The latter will
either return the existing dentry or a new one.
- switch from kmalloc() + strscpy() to kmemdup():
Use a more idiomatic way to copy the name for the new dentry that
userspace gave us.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al pointed out that on binderfs_fill_super() error
deactivate_locked_super() will call binderfs_kill_super() so all of the
freeing and putting we currently do in binderfs_fill_super() is unnecessary
and buggy. Let's simply return errors and let binderfs_fill_super() take
care of cleaning up on error.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- make binderfs control dentry immutable:
We don't allow to unlink it since it is crucial for binderfs to be
useable but if we allow to rename it we make the unlink trivial to
bypass. So prevent renaming too and simply treat the control dentry as
immutable.
- add is_binderfs_control_device() helper:
Take the opportunity and turn the check for the control dentry into a
separate helper is_binderfs_control_device() since it's now used in two
places.
- simplify binderfs_rename():
Instead of hand-rolling our custom version of simple_rename() just dumb
the whole function down to first check whether we're trying to rename the
control dentry. If we do EPERM the caller and if not call simple_rename().
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix to return a negative error code -ENOMEM from the new_inode() and
d_make_root() error handling cases instead of 0, as done elsewhere in
this function.
Fixes: 849d540ddf ("binderfs: implement "max" mount option")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>