The conntrack code can export the internal secid to userspace. These are
dynamic, can change on lsm changes, and have no meaning in userspace. We
should instead be sending lsm contexts to userspace instead. This patch sends
the secctx (rather than secid) to userspace over the netlink socket. We use a
new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did
not send particularly useful information.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: James Morris <jmorris@namei.org>
With the (long ago) interface change to have the secid_to_secctx functions
do the string allocation instead of having the caller do the allocation we
lost the ability to query the security server for the length of the
upcoming string. The SECMARK code would like to allocate a netlink skb
with enough length to hold the string but it is just too unclean to do the
string allocation twice or to do the allocation the first time and hold
onto the string and slen. This patch adds the ability to call
security_secid_to_secctx() with a NULL data pointer and it will just set
the slen pointer.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Right now secmark has lots of direct selinux calls. Use all LSM calls and
remove all SELinux specific knowledge. The only SELinux specific knowledge
we leave is the mode. The only point is to make sure that other LSMs at
least test this generic code before they assume it works. (They may also
have to make changes if they do not represent labels as strings)
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: James Morris <jmorris@namei.org>
All security modules shouldn't change sched_param parameter of
security_task_setscheduler(). This is not only meaningless, but also
make a harmful result if caller pass a static variable.
This patch remove policy and sched_param parameter from
security_task_setscheduler() becuase none of security module is
using it.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: James Morris <jmorris@namei.org>
We currently have a kernel internal type called aligned_u64 which aligns
__u64's on 8 bytes boundaries even on systems which would normally align
them on 4 byte boundaries. This patch creates a new type __aligned_u64
which does the same thing but which is exposed to userspace rather than
being kernel internal.
[akpm: merge early as both the net and audit trees want this]
[akpm@linux-foundation.org: enhance the comment describing the reasons for using aligned_u64. Via Andreas and Andi.]
Based-on-patch-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: David Miller <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41 ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.
Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.
dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical. And we really
don't need to mess up our include file headers more than they already
are.
Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.
Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...
[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]
And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)
Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().
It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups. This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.
This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have. I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.
Three choices:
Go with what we got (and implement the new feature next cycle). Add a
new field right now (and implement the new feature next cycle). Wait
till next cycle to release the ABI (and implement the new feature next
cycle). This is number 3.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: don't drop handle reference on unload
drm/ttm: Fix two race conditions + fix busy codepaths
2.6.36 introduces an API for drivers to switch the IO scheduler
instead of manually calling the elevator exit and init functions.
This API was added since q->elevator must be cleared in between
those two calls. And since we already have this functionality
directly from use by the sysfs interface to switch schedulers
online, it was prudent to reuse it internally too.
But this API needs the queue to be in a fully initialized state
before it is called, or it will attempt to unregister elevator
kobjects before they have been added. This results in an oops
like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000051
IP: [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
PGD 47ddfc067 PUD 47c6a1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
CPU 2
Modules linked in: t(+) loop hid_apple usbhid ahci ehci_hcd uhci_hcd libahci usbcore nls_base igb
Pid: 7319, comm: modprobe Not tainted 2.6.36-rc6+ #132 QSSC-S4R/QSSC-S4R
RIP: 0010:[<ffffffff8116f15e>] [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
RSP: 0018:ffff88027da25d08 EFLAGS: 00010246
RAX: ffff88047c68c528 RBX: 00000000fffffffe RCX: 0000000000000000
RDX: 000000000000002f RSI: 000000000000002f RDI: ffff88047e196c88
RBP: ffff88027da25d38 R08: 0000000000000000 R09: d84156c5635688c0
R10: d84156c5635688c0 R11: 0000000000000000 R12: ffff88047e196c88
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047c68c528
FS: 00007fcb0b26f6e0(0000) GS:ffff880287400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000051 CR3: 000000047e76e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 7319, threadinfo ffff88027da24000, task ffff88027d377090)
Stack:
ffff88027da25d58 ffff88047c68c528 00000000fffffffe ffff88047e196c88
<0> ffff88047c68c528 ffff88047e05bd90 ffff88027da25d78 ffffffff8123fb77
<0> ffff88047e05bd90 0000000000000000 ffff88047e196c88 ffff88047c68c528
Call Trace:
[<ffffffff8123fb77>] kobject_add_internal+0xe7/0x1f0
[<ffffffff8123fd98>] kobject_add_varg+0x38/0x60
[<ffffffff8123feb9>] kobject_add+0x69/0x90
[<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
[<ffffffff8103d48d>] ? sub_preempt_count+0x9d/0xe0
[<ffffffff8143de20>] ? _raw_spin_unlock+0x30/0x50
[<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
[<ffffffff8116eff4>] ? sysfs_remove_dir+0x34/0xa0
[<ffffffff81224204>] elv_register_queue+0x34/0xa0
[<ffffffff81224aad>] elevator_change+0xfd/0x250
[<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
[<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
[<ffffffffa007e0a8>] t_init+0xa8/0x361 [t]
[<ffffffff810001de>] do_one_initcall+0x3e/0x170
[<ffffffff8108c3fd>] sys_init_module+0xbd/0x220
[<ffffffff81002f2b>] system_call_fastpath+0x16/0x1b
Code: e5 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 85 ff 74 52 48 8b 47 18 49 c7 c5 00 46 61 81 48 85 c0 74 04 4c 8b 68 30 45 31 f6 <41> 80 7d 51 00 74 0e 49 8b 44 24 28 4c 89 e7 ff 50 20 49 89 c6
RIP [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
RSP <ffff88027da25d08>
CR2: 0000000000000051
---[ end trace a6541d3bf07945df ]---
Fix this by adding a registered bit to the elevator queue, which is
set when the sysfs kobjects have been registered.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This fixes a race pointed out by Dave Airlie where we don't take a buffer
object about to be destroyed off the LRU lists properly. It also fixes a rare
case where a buffer object could be destroyed in the middle of an
accelerated eviction.
The patch also adds a utility function that can be used to prematurely
release GPU memory space usage of an object waiting to be destroyed.
For example during eviction or swapout.
The above mentioned commit didn't queue the buffer on the delayed destroy
list under some rare circumstances. It also didn't completely honor the
remove_all parameter.
Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=615505http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The "flags" member of "struct wait_queue_t" is used in several places in
the kernel code without beeing initialized by init_wait(). "flags" is
used in bitwise operations.
If "flags" not initialized then unexpected behaviour may take place.
Incorrect flags might used later in code.
Added initialization of "wait_queue_t.flags" with zero value into
"init_wait".
Signed-off-by: Evgeny Kuznetsov <EXT-Eugeny.Kuznetsov@nokia.com>
[ The bit we care about does end up being initialized by both
prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to
cause actual bugs, but is definitely the right thing to do -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.
However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling. That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.
Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.
So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.
Future fixups:
- move the module list handling code into kernel/module.c where it
belongs.
- get rid of 'module_bug_list' and just use the regular list of modules
(called 'modules' - imagine that) that we already create and maintain
for other reasons.
Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
vmwgfx: Fix fb VRAM pinning failure due to fragmentation
vmwgfx: Remove initialisation of dev::devname
vmwgfx: Enable use of the vblank system
vmwgfx: vt-switch (master drop) fixes
drm/vmwgfx: Fix breakage introduced by commit "drm: block userspace under allocating buffer and having drivers overwrite it (v2)"
drm: Hold the mutex when dropping the last GEM reference (v2)
drm/gem: handlecount isn't really a kref so don't make it one.
drm: i810/i830: fix locked ioctl variant
drm/radeon/kms: add quirk for MSI K9A2GM motherboard
drm/radeon/kms: fix potential segfault in r600_ioctl_wait_idle
drm: Prune GEM vma entries
drm/radeon/kms: fix up encoder info messages for DFP6
drm/radeon: fix PCI ID 5657 to be an RV410
In order to be fully threadsafe we need to check that the drm_gem_object
refcount is still 0 after acquiring the mutex in order to call the free
function. Otherwise, we may encounter scenarios like:
Thread A: Thread B:
drm_gem_close
unreference_unlocked
kref_put mutex_lock
... i915_gem_evict
... kref_get -> BUG
... i915_gem_unbind
... kref_put
... i915_gem_object_free
... mutex_unlock
mutex_lock
i915_gem_object_free -> BUG
i915_gem_object_unbind
kfree
mutex_unlock
Note that no driver is currently using the free_unlocked vfunc and it is
scheduled for removal, hasten that process.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30454
Reported-and-Tested-by: Magnus Kessler <Magnus.Kessler@gmx.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm()
before entering into that state. CPUs tend to flush TLB in those c-states
anyways.
acpi_idle does this with C3-type states, but it was not caried over
when intel_idle was introduced. intel_idle can apply it
to C-states in addition to those that ACPI might export as C3...
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
There were lots of places being inconsistent since handle count
looked like a kref but it really wasn't.
Fix this my just making handle count an atomic on the object,
and have it increase the normal object kref.
Now i915/radeon/nouveau drivers can drop the normal reference on
userspace object creation, and have the handle hold it.
This patch fixes a memory leak or corruption on unload, because
the driver had no way of knowing if a handle had been actually
added for this object, and the fbcon object needed to know this
to clean itself up properly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The Enhanced Retransmission Mode(ERTM) is a realiable mode of operation
of the Bluetooth L2CAP layer. Think on it like a simplified version of
TCP.
The problem we were facing here was a deadlock. ERTM uses a backlog
queue to queue incomimg packets while the user is helding the lock. At
some moment the sk_sndbuf can be exceeded and we can't alloc new skbs
then the code sleep with the lock to wait for memory, that stalls the
ERTM connection once we can't read the acknowledgements packets in the
backlog queue to free memory and make the allocation of outcoming skb
successful.
This patch actually affect all users of bt_skb_send_alloc(), i.e., all
L2CAP modes and SCO.
We are safe against socket states changes or channels deletion while the
we are sleeping wait memory. Checking for the sk->sk_err and
sk->sk_shutdown make the code safe, since any action that can leave the
socket or the channel in a not usable state set one of the struct
members at least. Then we can check both of them when getting the lock
again and return with the proper error if something unexpected happens.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>