Andy Lutomirski pointed out that the current behavior of allowing the
owner of a user namespace to have all caps when that owner is not in a
parent user namespace is wrong. Add a test to ensure the owner of a user
namespace is in the parent of the user namespace to fix this bug.
Thankfully this bug did not apply to the initial user namespace, keeping
the mischief that can be caused by this bug quite small.
This is bug was introduced in v3.5 by commit 783291e690
"Simplify the user_namespace by making userns->creator a kuid."
But did not matter until the permisions required to create
a user namespace were relaxed allowing a user namespace to be created
inside of a user namespace.
The bug made it possible for the owner of a user namespace to be
present in a child user namespace. Since the owner of a user nameapce
is granted all capabilities it became possible for users in a
grandchild user namespace to have all privilges over their parent user
namspace.
Reorder the checks in cap_capable. This should make the common case
faster and make it clear that nothing magic happens in the initial
user namespace. The reordering is safe because cred->user_ns
can only be in targ_ns or targ_ns->parent but not both.
Add a comment a the top of the loop to make the logic of
the code clear.
Add a distinct variable ns that changes as we walk up
the user namespace hierarchy to make it clear which variable
is changing.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
There are a number of "conventions" for where to put LSM filesystems.
Smack adheres to none of them. Create a mount point at /sys/fs/smackfs
for mounting smackfs so that Smack can be conventional.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The components NETLABEL and SECURITY_NETWORK are required by
Smack. Using "depends" in Kconfig hides the Smack option
if the user hasn't figured out that they need to be enabled
while using make menuconfig. Using select is a better choice.
Because select is not recursive depends on NET and SECURITY
are added. The reflects similar usage in TOMOYO and AppArmor.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
With the addition of the new kernel module syscall, which defines two
arguments - a file descriptor to the kernel module and a pointer to a NULL
terminated string of module arguments - it is now possible to measure and
appraise kernel modules like any other file on the file system.
This patch adds support to measure and appraise kernel modules in an
extensible and consistent manner.
To support filesystems without extended attribute support, additional
patches could pass the signature as the first parameter.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that kernel module origins can be reasoned about, provide a hook to
the LSMs to make policy decisions about the module file. This will let
Chrome OS enforce that loadable kernel modules can only come from its
read-only hash-verified root filesystem. Other LSMs can, for example,
read extended attributes for signatures, etc.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
Pull cgroup changes from Tejun Heo:
"A lot of activities on cgroup side. The big changes are focused on
making cgroup hierarchy handling saner.
- cgroup_rmdir() had peculiar semantics - it allowed cgroup
destruction to be vetoed by individual controllers and tried to
drain refcnt synchronously. The vetoing never worked properly and
caused good deal of contortions in cgroup. memcg was the last
reamining user. Michal Hocko removed the usage and cgroup_rmdir()
path has been simplified significantly. This was done in a
separate branch so that the memcg people can base further memcg
changes on top.
- The above allowed cleaning up cgroup lifecycle management and
implementation of generic cgroup iterators which are used to
improve hierarchy support.
- cgroup_freezer updated to allow migration in and out of a frozen
cgroup and handle hierarchy. If a cgroup is frozen, all descendant
cgroups are frozen.
- netcls_cgroup and netprio_cgroup updated to handle hierarchy
properly.
- Various fixes and cleanups.
- Two merge commits. One to pull in memcg and rmdir cleanups (needed
to build iterators). The other pulled in cgroup/for-3.7-fixes for
device_cgroup fixes so that further device_cgroup patches can be
stacked on top."
Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)
* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
cgroup: update Documentation/cgroups/00-INDEX
cgroup_rm_file: don't delete the uncreated files
cgroup: remove subsystem files when remounting cgroup
cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
cgroup: warn about broken hierarchies only after css_online
cgroup: list_del_init() on removed events
cgroup: fix lockdep warning for event_control
cgroup: move list add after list head initilization
netprio_cgroup: allow nesting and inherit config on cgroup creation
netprio_cgroup: implement netprio[_set]_prio() helpers
netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
netprio_cgroup: reimplement priomap expansion
netprio_cgroup: shorten variable names in extend_netdev_table()
netprio_cgroup: simplify write_priomap()
netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
cgroup: remove obsolete guarantee from cgroup_task_migrate.
cgroup: add cgroup->id
cgroup, cpuset: remove cgroup_subsys->post_clone()
cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
...
Rebased on the latest net-next tree.
RTM_NEWNETCONF and RTM_GETNETCONF are missing in this table.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V5: fix two bugs pointed out by Thomas
remove seq check for now, mark it as TODO
V4: remove some useless #include
some coding style fix
V3: drop debugging printk's
update selinux perm table as well
V2: drop patch 1/2, export ifindex directly
Redesign netlink attributes
Improve netlink seq check
Handle IPv6 addr as well
This patch exports bridge multicast database via netlink
message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
(Thanks to Thomas for patient reviews)
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
===============================
[ INFO: suspicious RCU usage. ]
3.5.0-rc1+ #63 Not tainted
-------------------------------
security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by trinity-child1/8750:
#0: (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0
stack backtrace:
Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63
Call Trace:
[<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0
[<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0
[<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0
[<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0
[<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff812c9536>] security_socket_bind+0x16/0x20
[<ffffffff815550ca>] sys_bind+0x7a/0x100
[<ffffffff816c03d5>] ? sysret_check+0x22/0x5d
[<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b
This patch below does what Paul McKenney suggested in the previous thread.
Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Instead of locking the list during a delete, mark entries as invalid
and trigger a workqueue to clean them up. This lets us easily handle
task_free from interrupt context.
Signed-off-by: Kees Cook <keescook@chromium.org>
The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task. struct cred may go away as
soon as the rcu lock is released. This leads to a race where we
can dereference a stale user namespace pointer.
To make it obvious a struct cred is involved kill task_user_ns.
To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
device_cgroup uses RCU safe ->exceptions list which is write-protected
by devcgroup_mutex and has had some issues using locking correctly.
Add lockdep asserts to utility functions so that future errors can be
easily detected.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
dev_cgroup->exceptions is protected with devcgroup_mutex for writes
and RCU for reads; however, RCU usage isn't correct.
* dev_exception_clean() doesn't use RCU variant of list_del() and
kfree(). The function can race with may_access() and may_access()
may end up dereferencing already freed memory. Use list_del_rcu()
and kfree_rcu() instead.
* may_access() may be called only with RCU read locked but doesn't use
RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
In 4cef7299b4 ("device_cgroup: add proper checking when changing
default behavior") the cgroup parent usage is unchecked. root will not
have a parent and trying to use device.{allow,deny} will cause problems.
For some reason my stressing scripts didn't test the root directory so I
didn't catch it on my regular tests.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
BugLink: http://bugs.launchpad.net/bugs/1056078
Profile replacement can cause long chains of profiles to build up when
the profile being replaced is pinned. When the pinned profile is finally
freed, it puts the reference to its replacement, which may in turn nest
another call to free_profile on the stack. Because this may happen for
each profile in the replacedby chain this can result in a recusion that
causes the stack to overflow.
Break this nesting by directly walking the chain of replacedby profiles
(ie. use iteration instead of recursion to free the list). This results
in at most 2 levels of free_profile being called, while freeing a
replacedby chain.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>