While running stress tests on adding and deleting ftrace instances I hit
this bug:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_inode_permission+0x85/0x160
PGD 63681067 PUD 7ddbe067 PMD 0
Oops: 0000 [#1] PREEMPT
CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
Call Trace:
security_inode_permission+0x1c/0x30
__inode_permission+0x41/0xa0
inode_permission+0x18/0x50
link_path_walk+0x66/0x920
path_openat+0xa6/0x6c0
do_filp_open+0x43/0xa0
do_sys_open+0x146/0x240
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
RIP selinux_inode_permission+0x85/0x160
CR2: 0000000000000020
Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.
in selinux_inode_permission():
isec = inode->i_security;
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
selinux: look for IPsec labels on both inbound and outbound packets
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
selinux: fix possible memory leak
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull misc keyrings fixes from David Howells:
"These break down into five sets:
- A patch to error handling in the big_key type for huge payloads.
If the payload is larger than the "low limit" and the backing store
allocation fails, then big_key_instantiate() doesn't clear the
payload pointers in the key, assuming them to have been previously
cleared - but only one of them is.
Unfortunately, the garbage collector still calls big_key_destroy()
when sees one of the pointers with a weird value in it (and not
NULL) which it then tries to clean up.
- Three patches to fix the keyring type:
* A patch to fix the hash function to correctly divide keyrings off
from keys in the topology of the tree inside the associative
array. This is only a problem if searching through nested
keyrings - and only if the hash function incorrectly puts the a
keyring outside of the 0 branch of the root node.
* A patch to fix keyrings' use of the associative array. The
__key_link_begin() function initially passes a NULL key pointer
to assoc_array_insert() on the basis that it's holding a place in
the tree whilst it does more allocation and stuff.
This is only a problem when a node contains 16 keys that match at
that level and we want to add an also matching 17th. This should
easily be manufactured with a keyring full of keyrings (without
chucking any other sort of key into the mix) - except for (a)
above which makes it on average adding the 65th keyring.
* A patch to fix searching down through nested keyrings, where any
keyring in the set has more than 16 keyrings and none of the
first keyrings we look through has a match (before the tree
iteration needs to step to a more distal node).
Test in keyutils test suite:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1
- A patch to fix the big_key type's use of a shmem file as its
backing store causing audit messages and LSM check failures. This
is done by setting S_PRIVATE on the file to avoid LSM checks on the
file (access to the shmem file goes through the keyctl() interface
and so is gated by the LSM that way).
This isn't normally a problem if a key is used by the context that
generated it - and it's currently only used by libkrb5.
Test in keyutils test suite:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e
- A patch to add a generated file to .gitignore.
- A patch to fix the alignment of the system certificate data such
that it it works on s390. As I understand it, on the S390 arch,
symbols must be 2-byte aligned because loading the address discards
the least-significant bit"
* tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
KEYS: correct alignment of system_certificate_list content in assembly file
Ignore generated file kernel/x509_certificate_list
security: shmem: implement kernel private shmem inodes
KEYS: Fix searching of nested keyrings
KEYS: Fix multiple key add into associative array
KEYS: Fix the keyring hash function
KEYS: Pre-clear struct key on allocation
The new templates management mechanism records information associated
to an event into an array of 'ima_field_data' structures and makes it
available through the 'template_data' field of the 'ima_template_entry'
structure (the element of the measurements list created by IMA).
Since 'ima_field_data' contains dynamically allocated data (which length
varies depending on the data associated to a selected template field),
it is not enough to just free the memory reserved for a
'ima_template_entry' structure if something goes wrong.
This patch creates the new function ima_free_template_entry() which
walks the array of 'ima_field_data' structures, frees the memory
referenced by the 'data' pointer and finally the space reserved for
the 'ima_template_entry' structure. Further, it replaces existing kfree()
that have a pointer to an 'ima_template_entry' structure as argument
with calls to the new function.
Fixes: a71dc65: ima: switch to new template management mechanism
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
7bc5f447ce (ima: define new function ima_alloc_init_template() to
API) moved the initialization of 'entry' in ima_add_boot_aggregate() a
bit more below, after the if (ima_used_chip).
So, 'entry' is not initialized while being inside this if-block. So, we
should not attempt to free it.
Found by Coverity (CID: 1131971)
Fixes: 7bc5f447ce (ima: define new function ima_alloc_init_template() to API)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
We have a problem where the big_key key storage implementation uses a
shmem backed inode to hold the key contents. Because of this detail of
implementation LSM checks are being done between processes trying to
read the keys and the tmpfs backed inode. The LSM checks are already
being handled on the key interface level and should not be enforced at
the inode level (since the inode is an implementation detail, not a
part of the security model)
This patch implements a new function shmem_kernel_file_setup() which
returns the equivalent to shmem_file_setup() only the underlying inode
has S_PRIVATE set. This means that all LSM checks for the inode in
question are skipped. It should only be used for kernel internal
operations where the inode is not exposed to userspace without proper
LSM checking. It is possible that some other users of
shmem_file_setup() should use the new interface, but this has not been
explored.
Reproducing this bug is a little bit difficult. The steps I used on
Fedora are:
(1) Turn off selinux enforcing:
setenforce 0
(2) Create a huge key
k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`
(3) Access the key in another context:
runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null
(4) Examine the audit logs:
ausearch -m AVC -i --subject httpd_t | audit2allow
If the last command's output includes a line that looks like:
allow httpd_t user_tmpfs_t:file { open read };
There was an inode check between httpd and the tmpfs filesystem. With
this patch no such denial will be seen. (NOTE! you should clear your
audit log if you have tested for this previously)
(Please return you box to enforcing)
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hugh Dickins <hughd@google.com>
cc: linux-mm@kvack.org
If a keyring contains more than 16 keyrings (the capacity of a single node in
the associative array) then those keyrings are split over multiple nodes
arranged as a tree.
If search_nested_keyrings() is called to search the keyring then it will
attempt to manually walk over just the 0 branch of the associative array tree
where all the keyring links are stored. This works provided the key is found
before the algorithm steps from one node containing keyrings to a child node
or if there are sufficiently few keyring links that the keyrings are all in
one node.
However, if the algorithm does need to step from a node to a child node, it
doesn't change the node pointer unless a shortcut also gets transited. This
means that the algorithm will keep scanning the same node over and over again
without terminating and without returning.
To fix this, move the internal-pointer-to-node translation from inside the
shortcut transit handler so that it applies it to node arrival as well.
This can be tested by:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done
for ((i=17; i<=20; i++)); do keyctl search $r user a$i; done
The searches should all complete successfully (or with an error for 17-20),
but instead one or more of them will hang.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
If sufficient keys (or keyrings) are added into a keyring such that a node in
the associative array's tree overflows (each node has a capacity N, currently
16) and such that all N+1 keys have the same index key segment for that level
of the tree (the level'th nibble of the index key), then assoc_array_insert()
calls ops->diff_objects() to indicate at which bit position the two index keys
vary.
However, __key_link_begin() passes a NULL object to assoc_array_insert() with
the intention of supplying the correct pointer later before we commit the
change. This means that keyring_diff_objects() is given a NULL pointer as one
of its arguments which it does not expect. This results in an oops like the
attached.
With the previous patch to fix the keyring hash function, this can be forced
much more easily by creating a keyring and only adding keyrings to it. Add any
other sort of key and a different insertion path is taken - all 16+1 objects
must want to cluster in the same node slot.
This can be tested by:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
This should work fine, but oopses when the 17th keyring is added.
Since ops->diff_objects() is always called with the first pointer pointing to
the object to be inserted (ie. the NULL pointer), we can fix the problem by
changing the to-be-inserted object pointer to point to the index key passed
into assoc_array_insert() instead.
Whilst we're at it, we also switch the arguments so that they are the same as
for ->compare_object().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
Call Trace:
[<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
[<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
[<ffffffff811929a7>] __key_link_begin+0x78/0xe5
[<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
[<ffffffff81192e0a>] SyS_add_key+0x123/0x183
[<ffffffff81400ddb>] tracesys+0xdd/0xe2
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
The keyring hash function (used by the associative array) is supposed to clear
the bottommost nibble of the index key (where the hash value resides) for
keyrings and make sure it is non-zero for non-keyrings. This is done to make
keyrings cluster together on one branch of the tree separately to other keys.
Unfortunately, the wrong mask is used, so only the bottom two bits are
examined and cleared and not the whole bottom nibble. This means that keys
and keyrings can still be successfully searched for under most circumstances
as the hash is consistent in its miscalculation, but if a keyring's
associative array bottom node gets filled up then approx 75% of the keyrings
will not be put into the 0 branch.
The consequence of this is that a key in a keyring linked to by another
keyring, ie.
keyring A -> keyring B -> key
may not be found if the search starts at keyring A and then descends into
keyring B because search_nested_keyrings() only searches up the 0 branch (as it
"knows" all keyrings must be there and not elsewhere in the tree).
The fix is to use the right mask.
This can be tested with:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done
This creates a sandbox keyring, then creates 17 keyrings therein (labelled
ring0..ring16). This causes the root node of the sandbox's associative array
to overflow and for the tree to have extra nodes inserted.
Each keyring then is given a user key (labelled aN for ringN) for us to search
for.
We then search for the user keys we added, starting from the sandbox. If
working correctly, it should return the same ordered list of key IDs as
for...keyctl add... did. Without this patch, it reports ENOKEY "Required key
not available" for some of the keys. Just which keys get this depends as the
kernel pointer to the key type forms part of the hash function.
Reported-by: Nalin Dahyabhai <nalin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
The second word of key->payload does not get initialised in key_alloc(), but
the big_key type is relying on it having been cleared. The problem comes when
big_key fails to instantiate a large key and doesn't then set the payload. The
big_key_destroy() op is called from the garbage collector and this assumes that
the dentry pointer stored in the second word will be NULL if instantiation did
not complete.
Therefore just pre-clear the entire struct key on allocation rather than trying
to be clever and only initialising to 0 only those bits that aren't otherwise
initialised.
The lack of initialisation can lead to a bug report like the following if
big_key failed to initialise its file:
general protection fault: 0000 [#1] SMP
Modules linked in: ...
CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.10.0-53.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge 1955/0HC513, BIOS 1.4.4 12/09/2008
Workqueue: events key_garbage_collector
task: ffff8801294f5680 ti: ffff8801296e2000 task.ti: ffff8801296e2000
RIP: 0010:[<ffffffff811b4a51>] dput+0x21/0x2d0
...
Call Trace:
[<ffffffff811a7b06>] path_put+0x16/0x30
[<ffffffff81235604>] big_key_destroy+0x44/0x60
[<ffffffff8122dc4b>] key_gc_unused_keys.constprop.2+0x5b/0xe0
[<ffffffff8122df2f>] key_garbage_collector+0x1df/0x3c0
[<ffffffff8107759b>] process_one_work+0x17b/0x460
[<ffffffff8107834b>] worker_thread+0x11b/0x400
[<ffffffff81078230>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff8107eb00>] kthread+0xc0/0xd0
[<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110
[<ffffffff815c4bec>] ret_from_fork+0x7c/0xb0
[<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110
Reported-by: Patrik Kis <pkis@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stephen Gallagher <sgallagh@redhat.com>
This patch stores the address of the 'template_fmt_copy' variable in a new
variable, called 'template_fmt_ptr', so that the latter is passed as an
argument of strsep() instead of the former. This modification is needed
in order to correctly free the memory area referenced by
'template_fmt_copy' (strsep() modifies the pointer of the passed string).
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch defines a new value for the 'ima_show_type' enumerator
(IMA_SHOW_BINARY_NO_FIELD_LEN) to prevent that the field length
is transmitted through the 'binary_runtime_measurements' interface
for the digest field of the 'ima' template.
Fixes commit: 3ce1217 ima: define template fields library and new helpers
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
To maintain compatibility with userspace tools, the field length must not
be included in the template digest calculation for the 'ima' template.
Fixes commit: a71dc65 ima: switch to new template management mechanism
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This reverts commit 217091dd7a, which
caused the following build error:
security/integrity/digsig.c:70:5: error: redefinition of ‘integrity_init_keyring’
security/integrity/integrity.h:149:12: note: previous definition of ‘integrity_init_keyring’ w
security/integrity/integrity.h:149:12: warning: ‘integrity_init_keyring’ defined but not used
reported by Krzysztof Kolasa. Mimi says:
"I made the classic mistake of requesting this patch to be upstreamed
at the last second, rather than waiting until the next open window.
At this point, the best course would probably be to revert the two
commits and fix them for the next open window"
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>