linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Linus Torvalds	140d0b2108	Do 'shm_init_ns()' in an early pure_initcall This isn't really critical any more, since other patches (commit `298507d4d2`: "shm: optimize exit_shm()") have caused us to not actually need to touch the rw_mutex unless there are actual shm segments associated with the namespace, but we really should do tne shm_init_ns() earlier than we do now. This, together with commit `288d5abec8` ("Boot up with usermodehelper disabled") will mean that we really do initialize the initial ipc namespace data structure before we run any tasks. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-04 19:35:59 -10:00
Vasiliy Kulikov	298507d4d2	shm: optimize exit_shm() We may optimistically check .in_use == 0 without holding the rw_mutex: it's the common case, and if it's zero, there certainly won't be any segments associated with us. After taking the lock, the idr_for_each() will do the right thing, so we could now drop the re-check inside the lock without any real cost. But it won't hurt. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:45:55 -10:00
Vasiliy Kulikov	33a30ed4bd	shm: fix wrong tests Commit `4c677e2eef` ("shm: optimize locking and ipc_namespace getting") introduced a copy-paste bug. Due to the bug cycle optimizations were disabled. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 14:45:55 -10:00
Vasiliy Kulikov	4c677e2eef	shm: optimize locking and ipc_namespace getting shm_lock() does a lookup of shm segment in shm_ids(ns).ipcs_idr, which is redundant as we already know shmid_kernel address. An actual lock is also not required for reads until we really want to destroy the segment. exit_shm() and shm_destroy_orphaned() may avoid the loop by checking whether there is at least one segment in current ipc_namespace. The check of nsproxy and ipc_ns against NULL is redundant as exit_shm() is called from do_exit() before the call to exit_notify(), so the dereferencing current->nsproxy->ipc_ns is guaranteed to be safe. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-30 08:44:20 -10:00
Vasiliy Kulikov	5774ed014f	shm: handle separate PID namespaces case shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle the case of separate PID namespaces, but a single IPC namespace. If there are tasks with the same PID values using the same shmem object, the wrong destroy decision could be reached. On shm segment creation store the pointer to the creator task in shmid_kernel->shm_creator field and zero it on task exit. Then use the ->shm_creator insread of shm_cprid in both functions. As shmid_kernel object is already locked at this stage, no additional locking is needed. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-30 08:44:19 -10:00
Vasiliy Kulikov	b34a6b1da3	ipc: introduce shm_rmid_forced sysctl Add support for the shm_rmid_forced sysctl. If set to 1, all shared memory objects in current ipc namespace will be automatically forced to use IPC_RMID. The POSIX way of handling shmem allows one to create shm objects and call shmdt(), leaving shm object associated with no process, thus consuming memory not counted via rlimits. With shm_rmid_forced=1 the shared memory object is counted at least for one process, so OOM killer may effectively kill the fat process holding the shared memory. It obviously breaks POSIX - some programs relying on the feature would stop working. So set shm_rmid_forced=1 only if you're sure nobody uses "orphaned" memory. Use shm_rmid_forced=0 by default for compatability reasons. The feature was previously impemented in -ow as a configure option. [akpm@linux-foundation.org: fix documentation, per Randy] [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: readability/conventionality tweaks] [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout] Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Solar Designer <solar@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Jiri Slaby	d40dcdb017	ipc/mqueue.c: fix mq_open() return value We return ENOMEM from mqueue_get_inode even when we have enough memory. Namely in case the system rlimit of mqueue was reached. This error propagates to mq_queue and user sees the error unexpectedly. So fix this up to properly return EMFILE as described in the manpage: EMFILE The process already has the maximum number of files and message queues open. instead of: ENOMEM Insufficient memory. With the previous patch we just switch to ERR_PTR/PTR_ERR/IS_ERR error handling here. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Jiri Slaby	04715206c0	ipc/mqueue.c: refactor failure handling If new_inode fails to allocate an inode we need only to return with NULL. But now we test the opposite and have all the work in a nested block. So do the opposite to save one indentation level (and remove unnecessary line breaks). This is only a preparation/cleanup for the next patch where we fix up return values from mqueue_get_inode. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Manfred Spraul	d694ad62bf	ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID If a semaphore array is removed and in parallel a sleeping task is woken up (signal or timeout, does not matter), then the woken up task does not wait until wake_up_sem_queue_do() is completed. This will cause crashes, because wake_up_sem_queue_do() will read from a stale pointer. The fix is simple: Regardless of anything, always call get_queue_result(). This function waits until wake_up_sem_queue_do() has finished it's task. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142 Reported-by: Yuriy Yevtukhov <yuriy@ucoz.com> Reported-by: Harald Laabs <kernel@dasr.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: <stable@kernel.org> [2.6.35+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:07 -07:00
Linus Torvalds	bbd9d6f7fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.	2011-07-22 19:02:39 -07:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Lai Jiangshan	d4ee9aa33d	ipc,rcu: Convert call_rcu(ipc_immediate_free) to kfree_rcu() The rcu callback ipc_immediate_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(ipc_immediate_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-07-20 14:10:16 -07:00
Lai Jiangshan	693a8b6eec	ipc,rcu: Convert call_rcu(free_un) to kfree_rcu() The rcu callback free_un() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(free_un). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-07-20 14:10:16 -07:00
KOSAKI Motohiro	ca16d140af	mm: don't access vm_flags as 'int' The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-26 09:20:31 -07:00
Eric W. Biederman	a00eaf11a2	ns proc: Add support for the ipc namespace Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-10 14:35:47 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Randy Dunlap	6213cfe824	ipc: fix util.c kernel-doc warnings Fix ipc/util.c kernel-doc warnings: Warning(ipc/util.c:336): No description found for parameter 'ns' Warning(ipc/util.c:620): No description found for parameter 'ns' Warning(ipc/util.c:790): No description found for parameter 'ns' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-27 19:30:19 -07:00
Xiaotian Feng	be4d250ab4	ipcns: fix use after free in free_ipc_ns() commit `b515498` ("userns: add a user namespace owner of ipc ns") added a user namespace owner of ipc ns, but it also introduced a use after free in free_ipc_ns(). Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: "Serge E. Hallyn" <serge.hallyn@canonical.com> Acked-by: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-25 17:45:16 -07:00
Serge E. Hallyn	b0e77598f8	userns: user namespaces: convert several capable() calls CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(), because the resource comes from current's own ipc namespace. setuid/setgid are to uids in own namespace, so again checks can be against current_user_ns(). Changelog: Jan 11: Use task_ns_capable() in place of sched_capable(). Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Clarify (hopefully) some logic in futex and sched.c Feb 15: use ns_capable for ipc, not nsown_capable Feb 23: let copy_ipcs handle setting ipc_ns->user_ns Feb 23: pass ns down rather than taking it from current [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:47:08 -07:00
Serge E. Hallyn	b515498f5b	userns: add a user namespace owner of ipc ns Changelog: Feb 15: Don't set new ipc->user_ns if we didn't create a new ipc_ns. Feb 23: Move extern declaration to ipc_namespace.h, and group fwd declarations at top. Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:47:07 -07:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Vasiliy Kulikov	3af54c9bd9	ipc: shm: fix information leak to userland The shmid_ds structure is copied to userland with shm_unused{,2,3} fields unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 08:25:51 -07:00
Al Viro	ceefda6931	switch get_sb_ns() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:17:03 -04:00
Dan Rosenberg	03145beb45	ipc: initialize structure memory to zero for compat functions This takes care of leaking uninitialized kernel stack memory to userspace from non-zeroed fields in structs in compat ipc functions. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:13 -07:00
Helge Deller	b795218075	ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm The kernel currently provides no functionality to analyze the RSS and swap space usage of each individual sysvipc shared memory segment. This patch adds this info for each existing shm segment by extending the output of /proc/sysvipc/shm by two columns for RSS and swap. Since shmctl(SHM_INFO) already provides a similiar calculation (it currently sums up all RSS/swap info for all segments), I did split out a static function which is now used by the /proc/sysvipc/shm output and shmctl(SHM_INFO). SAP products (esp. the SAP Netweaver ABAP Kernel) uses lots of big shared memory segments (we often have Linux systems with >= 16GB shm usage). Sometimes we get customer reports about "slow" system responses and while looking into their configurations we often find massive swapping activity on the system. With this patch it's now easy to see from the command line if and which shm segments gets swapped out (and how much) and can more easily give recommendations for system tuning. Without the patch it's currently not possible to do such shm analysis at all. Also... Add some spaces in front of the "size" field for 64bit kernels to get the columns correct if you cat the contents of the file. In sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE" format, which is defined like this: #if BITS_PER_LONG <= 32 #define SIZE_SPEC "%10lu" #else #define SIZE_SPEC "%21lu" #endif So, if the header is not adjusted, the columns are not correctly aligned. I actually tested this on 32- and 64-bit and it seems correct now. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:13 -07:00

1 2 3 4 5 ...

313 Commits