Commit Graph

5243 Commits

Author SHA1 Message Date
Hugh Dickins
9c24624727 KSYM_SYMBOL_LEN fixes
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:54 -08:00
Tom Zanussi
fbb5b7ae4b relayfs: fix infinite loop with splice()
Running kmemtraced, which uses splice() on relayfs, causes a hard lock on
x86-64 SMP.  As described by Tom Zanussi:

  It looks like you hit the same problem as described here:

  commit 8191ecd1d1

      splice: fix infinite loop in generic_file_splice_read()

  relay uses the same loop but it never got noticed or fixed.

Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:52 -08:00
Al Viro
48887e63d6 [PATCH] fix broken timestamps in AVC generated by kernel threads
Timestamp in audit_context is valid only if ->in_syscall is set.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09 02:27:41 -05:00
Randy Dunlap
7f0ed77d24 [patch 1/1] audit: remove excess kernel-doc
Delete excess kernel-doc notation in kernel/auditsc.c:

Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry'
Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09 02:27:40 -05:00
Al Viro
a64e64944f [PATCH] return records for fork() both to child and parent
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09 02:27:38 -05:00
Eric Paris
a3f07114e3 [PATCH] Audit: make audit=0 actually turn off audit
Currently audit=0 on the kernel command line does absolutely nothing.
Audit always loads and always uses its resources such as creating the
kernel netlink socket.  This patch causes audit=0 to actually disable
audit.  Audit will use no resources and starting the userspace auditd
daemon will not cause the kernel audit system to activate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09 02:27:37 -05:00
Linus Torvalds
bbeba4c35c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev:
  [PATCH] fix bogus argument of blkdev_put() in pktcdvd
  [PATCH 2/2] documnt FMODE_ constants
  [PATCH 1/2] kill FMODE_NDELAY_NOW
  [PATCH] clean up blkdev_get a little bit
  [PATCH] Fix block dev compat ioctl handling
  [PATCH] kill obsolete temporary comment in swsusp_close()
2008-12-04 21:45:44 -08:00
Linus Torvalds
4857339d7c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  time: catch xtime_nsec underflows and fix them
  posix-cpu-timers: fix clock_gettime with CLOCK_PROCESS_CPUTIME_ID
2008-12-04 21:40:29 -08:00
Linus Torvalds
3b666ce6a2 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0
  documentation: local_ops fix on_each_cpu
2008-12-04 21:39:41 -08:00
Al Viro
50c396d38c [PATCH] kill obsolete temporary comment in swsusp_close()
it had been put there to mark the call of blkdev_put() that
needed proper argument propagated to it; later patch in the
same series had done just that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04 04:22:54 -05:00
john stultz
6c9bacb41c time: catch xtime_nsec underflows and fix them
Impact: fix time warp bug

Alex Shi, along with Yanmin Zhang have been noticing occasional time
inconsistencies recently. Through their great diagnosis, they found that
the xtime_nsec value used in update_wall_time was occasionally going
negative. After looking through the code for awhile, I realized we have
the possibility for an underflow when three conditions are met in
update_wall_time():

  1) We have accumulated a second's worth of nanoseconds, so we
     incremented xtime.tv_sec and appropriately decrement xtime_nsec.
     (This doesn't cause xtime_nsec to go negative, but it can cause it
      to be small).

  2) The remaining offset value is large, but just slightly less then
     cycle_interval.

  3) clocksource_adjust() is speeding up the clock, causing a
     corrective amount (compensating for the increase in the multiplier
     being multiplied against the unaccumulated offset value) to be
     subtracted from xtime_nsec.

This can cause xtime_nsec to underflow.

Unfortunately, since we notify the NTP subsystem via second_overflow()
whenever we accumulate a full second, and this effects the error
accumulation that has already occured, we cannot simply revert the
accumulated second from xtime nor move the second accumulation to after
the clocksource_adjust call without a change in behavior.

This leaves us with (at least) two options:

1) Simply return from clocksource_adjust() without making a change if we
   notice the adjustment would cause xtime_nsec to go negative.

This would work, but I'm concerned that if a large adjustment was needed
(due to the error being large), it may be possible to get stuck with an
ever increasing error that becomes too large to correct (since it may
always force xtime_nsec negative). This may just be paranoia on my part.

2) Catch xtime_nsec if it is negative, then add back the amount its
   negative to both xtime_nsec and the error.

This second method is consistent with how we've handled earlier rounding
issues, and also has the benefit that the error being added is always in
the oposite direction also always equal or smaller then the correction
being applied. So the risk of a corner case where things get out of
control is lessened.

This patch fixes bug 11970, as tested by Yanmin Zhang
http://bugzilla.kernel.org/show_bug.cgi?id=11970

Reported-by: alex.shi@intel.com
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 08:43:02 +01:00
Roel Kluin
201955463a check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0
Impact: fix warnings-limit cutoff check for debug feature

unsigned sysctl_hung_task_warnings cannot be less than 0

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 10:11:51 +01:00
Arjan van de Ven
a800599283 taint: add missing comment
The description for 'D' was missing in the comment...  (causing me a
minute of WTF followed by looking at more of the code)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:24 -08:00
Davide Libenzi
7ef9964e6d epoll: introduce resource usage limits
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface.  Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds.  To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced.  A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:

  max_user_instances = Maximum number of devices - per user

  max_user_watches   = Maximum number of "watched" fds - per user

The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM.  As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users.  The
default value for "max_user_instances" is set to 128, that should be
enough too.

This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
listed, so that should be ok.

[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:24 -08:00
Linus Torvalds
9bd062d9ea Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: prevent divide by zero error in cpu_avg_load_per_task, update
  sched, cpusets: fix warning in kernel/cpuset.c
  sched: prevent divide by zero error in cpu_avg_load_per_task
2008-11-30 13:06:47 -08:00
Linus Torvalds
72244c0e68 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq.h: fix missing/extra kernel-doc
  genirq: __irq_set_trigger: change pr_warning to pr_debug
  irq: fix typo
  x86: apic honour irq affinity which was set in early boot
  genirq: fix the affinity setting in setup_irq
  genirq: keep affinities set from userspace across free/request_irq()
2008-11-30 13:06:20 -08:00
Linus Torvalds
93b10052f9 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: consistent alignement for lockdep info
2008-11-30 13:05:46 -08:00
Linus Torvalds
7bbc67fbf6 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: prevent recursion
  tracing, doc: update mmiotrace documentation
  x86, mmiotrace: fix buffer overrun detection
  function tracing: fix wrong position computing of stack_trace
2008-11-30 13:05:31 -08:00
Christoph Hellwig
96b8936a9e remove __ARCH_WANT_COMPAT_SYS_PTRACE
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).

Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 11:00:15 -08:00
Al Viro
8419641450 cpuinit fixes in kernel/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Ingo Molnar
af6d596fd6 sched: prevent divide by zero error in cpu_avg_load_per_task, update
Regarding the bug addressed in:

  4cd4262: sched: prevent divide by zero error in cpu_avg_load_per_task

Linus points out that the fix is not complete:

> There's nothing that keeps gcc from deciding not to reload
> rq->nr_running.
>
> Of course, in _practice_, I don't think gcc ever will (if it decides
> that it will spill, gcc is likely going to decide that it will
> literally spill the local variable to the stack rather than decide to
> reload off the pointer), but it's a valid compiler optimization, and
> it even has a name (rematerialization).
>
> So I suspect that your patch does fix the bug, but it still leaves the
> fairly unlikely _potential_ for it to re-appear at some point.
>
> We have ACCESS_ONCE() as a macro to guarantee that the compiler
> doesn't rematerialize a pointer access. That also would clarify
> the fact that we access something unsafe outside a lock.

So make sure our nr_running value is immutable and cannot change
after we check it for nonzero.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 20:45:15 +01:00
Ingo Molnar
1583715ddb sched, cpusets: fix warning in kernel/cpuset.c
this warning:

  kernel/cpuset.c: In function ‘generate_sched_domains’:
  kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function

triggers because GCC does not recognize that ndoms stays uninitialized
only if doms is NULL - but that flow is covered at the end of
generate_sched_domains().

Help out GCC by initializing this variable to 0. (that's prudent anyway)

Also, this function needs a splitup and code flow simplification:
with 160 lines length it's clearly too long.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 20:39:29 +01:00
Steven Rostedt
4cd4262034 sched: prevent divide by zero error in cpu_avg_load_per_task
Impact: fix divide by zero crash in scheduler rebalance irq

While testing the branch profiler, I hit this crash:

divide error: 0000 [#1] PREEMPT SMP
[...]
RIP: 0010:[<ffffffff8024a008>]  [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f
[...]
Call Trace:
 <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa
 [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21
 [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6
 [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f
 [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232
 [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e
 [<ffffffff8021047a>] do_softirq+0x94/0x1cd
 [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e
 [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff
 [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20

The code for cpu_avg_load_per_task has:

	if (rq->nr_running)
		rq->avg_load_per_task = rq->load.weight / rq->nr_running;

The runqueue lock is not held here, and there is nothing that prevents
the rq->nr_running from going to zero after it passes the if condition.

The branch profiler simply made the race window bigger.

This patch saves off the rq->nr_running to a local variable and uses that
for both the condition and the division.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 10:29:52 +01:00
Lai Jiangshan
4f5a7f40dd ftrace: prevent recursion
Impact: prevent unnecessary stack recursion

if the resched flag was set before we entered, then don't reschedule.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 10:11:53 +01:00
Petr Tesarik
eccdaeafae posix-cpu-timers: fix clock_gettime with CLOCK_PROCESS_CPUTIME_ID
Since CLOCK_PROCESS_CPUTIME_ID is in fact translated to -6, the switch
statement in cpu_clock_sample_group() must first mask off the irrelevant
bits, similar to cpu_clock_sample().

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

--
 posix-cpu-timers.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2008-11-24 16:41:40 +01:00