linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Tetsuo Handa	6fc48af82c	sysctl: check length at deprecated_sysctl_warning Original patch assumed args->nlen < CTL_MAXNAME, but it can be false. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:37 -08:00
Peter Zijlstra	b82d9fdd84	sched: avoid large irq-latencies in smp-balancing SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs are large this can cause large irq-latencies. Limit the nr of iterations on each run. This fixes a scheduling latency regression reported by the -rt folks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-11-09 22:39:39 +01:00
Eric Dumazet	d6322faf29	sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC 1) hardcoded 1000000000 value is used five times in places where NSEC_PER_SEC might be more readable. 2) A conversion from nsec to msec uses the hardcoded 1000000 value, which is a candidate for NSEC_PER_MSEC. no code changed: text data bss dec hex filename 44359 3326 36 47721 ba69 sched.o.before 44359 3326 36 47721 ba69 sched.o.after Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-11-09 22:39:38 +01:00
Peter Zijlstra	b2be5e96dc	sched: reintroduce the sched_min_granularity tunable we lost the sched_min_granularity tunable to a clever optimization that uses the sched_latency/min_granularity ratio - but the ratio is quite unintuitive to users and can also crash the kernel if the ratio is set to 0. So reintroduce the min_granularity tunable, while keeping the ratio maintained internally. no functionality changed. [ mingo@elte.hu: some fixlets. ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-11-09 22:39:37 +01:00
Pavel Emelyanov	b488893a39	pid namespaces: changes to show virtual ids to user This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Serge E. Hallyn	b460cbc581	pid namespaces: define is_global_init() and is_container_init() is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Andrew Morgan	72c2d5823f	V3 file capabilities: alter behavior of cap_setpcap The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) \| (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:24 -07:00
Eric W. Biederman	7058cb02dd	sysctl: deprecate sys_sysctl in a user space visible fashion. After adding checking to register_sysctl_table and finding a whole new set of bugs. Missed by countless code reviews and testers I have finally lost patience with the binary sysctl interface. The binary sysctl interface has been sort of deprecated for years and finding a user space program that uses the syscall is more difficult then finding a needle in a haystack. Problems continue to crop up, with the in kernel implementation. So since supporting something that no one uses is silly, deprecate sys_sysctl with a sufficient grace period and notice that the handful of user space applications that care can be fixed or replaced. The /proc/sys sysctl interface that people use will continue to be supported indefinitely. This patch moves the tested warning about sysctls from the path where sys_sysctl to a separate path called from both implementations of sys_sysctl, and it adds a proper entry into Documentation/feature-removal-schedule. Allowing us to revisit this in a couple years time and actually kill sys_sysctl. [lethal@linux-sh.org: sysctl: Fix syscall disabled build] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	fc6cd25b73	sysctl: Error on bad sysctl tables After going through the kernels sysctl tables several times it has become clear that code review and testing is just not effective in prevent problematic sysctl tables from being used in the stable kernel. I certainly can't seem to fix the problems as fast as they are introduced. Therefore this patch adds sysctl_check_table which is called when a sysctl table is registered and checks to see if we have a problematic sysctl table. The biggest part of the code is the table of valid binary sysctl entries, but since we have frozen our set of binary sysctls this table should not need to change, and it makes it much easier to detect when someone unintentionally adds a new binary sysctl value. As best as I can determine all of the several hundred errors spewed on boot up now are legitimate. [bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	c65f92398e	sysctl: remove the cad_pid binary sysctl path It looks like we inadvertently killed the cad_pid binary sysctl support when cap_pid was changed to be a struct pid. Since no one has complained just remove the binary path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	35834ca1e4	sysctl: simplify the pty sysctl logic Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl logic into drivers/char/pty.c As well as cleaning up the logic this prevents sysctl_check_table from complaining that the root table has a NULL data pointer on something with generic methods. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	0d135a4a8c	sysctl: remove the binary interface for aio-nr, aio-max-nr, acpi_video_flags aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl does not handle properly with a 64bit kernel and a 32bit user space. Since no one is likely to be using the binary sysctl values and the ascii interface still works, this patch just removes support for the binary sysctl interface from the kernel. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	f5ead5cefc	sysctl: remove binary sysctl support where it clearly doesn't work These functions are all wrapper functions for the proc interface that are needed for them to work correctly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Eric W. Biederman	49a0c45833	sysctl: Factor out sysctl_data. There as been no easy way to wrap the default sysctl strategy routine except for returning 0. Which is not always what we want. The few instances I have seen that want different behaviour have written their own version of sysctl_data. While not too hard it is unnecessary code and has the potential for extra bugs. So to make these situations easier and make that part of sysctl more symetric I have factord sysctl_data out of do_sysctl_strategy and exported as a function everyone can use. Further having sysctl_data be an explicit function makes checking for badly formed sysctl tables much easier. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Eric W. Biederman	d8217f076b	sysctl core: Stop using the unnecessary ctl_table typedef In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't needed and is a bit of a nuisance because it makes it harder to recognize ctl_table is a type name. So this patch removes it from the generic sysctl code. Hopefully I will have enough energy to send the rest of my patches will follow and to remove it from the rest of the kernel. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Ravikiran G Thirumalai	c4f3b63fe1	softlockup: add a /proc tuning parameter Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time [akpm@linux-foundation.org: Fix various warnings and add definition of "two"] Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:47 -07:00
David Rientjes	fe071d7e8a	oom: add oom_kill_allocating_task sysctl Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill the OOM-triggering task instead of scanning through the tasklist to find a memory-hogging target. This is helpful for systems with an insanely large number of tasks where scanning the tasklist significantly degrades performance. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:46 -07:00
Peter Zijlstra	04fbfdc14e	mm: per device dirty threshold Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently have will go away, namely: - mutual interference starvation (for any number of BDIs); - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). It might be that all dirty pages are for a single BDI while other BDIs are idling. By giving each BDI a 'fair' share of the dirty limit, each one can have dirty pages outstanding and make progress. A global threshold also creates a deadlock for stacked BDIs; when A writes to B, and A generates enough dirty pages to get throttled, B will never start writeback until the dirty pages go away. Again, by giving each BDI its own 'independent' dirty limit, this problem is avoided. So the problem is to determine how to distribute the total dirty limit across the BDIs fairly and efficiently. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. [akpm@linux-foundation.org: fix warnings] [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
Adam Litke	54f9f80d65	hugetlb: Add hugetlb_dynamic_pool sysctl The maximum size of the huge page pool can be controlled using the overall size of the hugetlb filesystem (via its 'size' mount option). However in the common case the this will not be set as the pool is traditionally fixed in size at boot time. In order to maintain the expected semantics, we need to prevent the pool expanding by default. This patch introduces a new sysctl controlling dynamic pool resizing. When this is enabled the pool will expand beyond its base size up to the size of the hugetlb filesystem. It is disabled by default. Signed-off-by: Adam Litke <agl@us.ibm.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Dave McCracken <dave.mccracken@oracle.com> Cc: William Irwin <bill.irwin@oracle.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Ken Chen <kenchen@google.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:02 -07:00
Ingo Molnar	da84d96176	sched: reintroduce cache-hot affinity reintroduce a simplified version of cache-hot/cold scheduling affinity. This improves performance with certain SMP workloads, such as sysbench. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:18 +02:00
Peter Zijlstra	5f6d858ecc	sched: speed up and simplify vslice calculations speed up and simplify vslice calculations. [ From: Mike Galbraith <efault@gmx.de>: build fix ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:12 +02:00
Ingo Molnar	e22f5bbf86	sched: remove wait_runtime limit remove the wait_runtime-limit fields and the code depending on it, now that the math has been changed over to rely on the vruntime metric. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:06 +02:00
Ingo Molnar	8ebc91d936	sched: remove stat_gran remove the stat_gran code - it was disabled by default and it causes unnecessary overhead. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Al Viro	2b8232ce51	minimal build fixes for uml (fallout from x86 merge) a) include/asm-um/arch can't just point to include/asm-$(SUBARCH) now b) arch/{i386,x86_64}/crypto are merged now c) subarch-obj needed changes d) cpufeature_64.h should pull "cpufeature_32.h", not <asm/cpufeature_32.h> since it can be included from asm-um/cpufeature.h e) in case of uml-i386 we need CONFIG_X86_32 for make and gcc, but not for Kconfig f) sysctl.c shouldn't do vdso_enabled for uml-i386 (actually, that one should be registered from corresponding arch//kernel/, with ifdef going away; that's a separate patch, though). With that and with Stephen's patch ("[PATCH net-2.6] uml: hard_header fix") we have uml allmodconfig building both on i386 and amd64. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-13 09:57:15 -07:00
Olof Johansson	d0c3d534a4	[POWERPC] Implement logging of unhandled signals Implement show_unhandled_signals sysctl + support to print when a process is killed due to unhandled signals just as i386 and x86_64 does. Default to having it off, unlike x86 that defaults on. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-12 14:05:18 +10:00

1 2 3 4 5 ...

145 Commits