Commit Graph

10979 Commits

Author SHA1 Message Date
Masami Hiramatsu
1ff511e35e tracing/kprobes: Add bitfield type
Add bitfield type for tracing arguments on kprobe-tracer.  The syntax of
a bitfield type is:

 b<bit-size>@<bit-offset>/<container-size>

e.g.

Accessing 2 bits-width field with 4 bits-offset in 32 bits-width data at
4 bytes offseted from the address pointed by AX register:

 +4(%ax):b2@4/32

Since the width of container data depends on the arch, so I just added
the container-size at the end.

Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20110204125205.9507.11363.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-02-07 09:18:11 -02:00
Masami Hiramatsu
e374536998 tracing/kprobes: Support longer (>128 bytes) command
Expand command line buffer of kprobe-tracer to 4096 bytes.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20110204125159.9507.20895.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-02-07 09:15:59 -02:00
Masami Hiramatsu
76022db323 tracing/kprobes: Cleanup strict_strtol() using code
Since strict_strtol() accepts minus digits started with '-', it doesn't
need to invert after converting.

Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20110204125153.9507.49335.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-02-07 09:13:36 -02:00
Ingo Molnar
c7f9a6f377 Merge branch 'linus' into perf/core
Merge reason: Pick up perf fixes that are now upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-07 08:44:26 +01:00
Linus Torvalds
f0adc82064 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep, timer: Fix del_timer_sync() annotation
  RTC: Prevents a division by zero in kernel code.
2011-02-06 12:05:15 -08:00
Ingo Molnar
862b6f62bf Merge branch 'tip/perf/urgent-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2011-02-04 19:02:53 +01:00
Peter Zijlstra
f266a5110d lockdep, timer: Fix del_timer_sync() annotation
Calling local_bh_enable() will want to actually start processing
softirqs, which isn't a good idea since this can get called with IRQs
disabled.

Cure this by using _local_bh_enable() which doesn't start processing
softirqs, and use raw_local_irq_save() to avoid any softirqs from
happening without letting lockdep think IRQs are in fact disabled.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <20110203141548.039540914@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-04 10:31:22 +01:00
Linus Torvalds
aba99437f5 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Prevent irq storm on migration
2011-02-03 09:17:41 -08:00
Linus Torvalds
49abda9892 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix update_curr_rt()
  sched, docs: Update schedstats documentation to version 15
2011-02-03 08:55:07 -08:00
Linus Torvalds
eb487ab4d5 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix reading in perf_event_read()
  watchdog: Don't change watchdog state on read of sysctl
  watchdog: Fix sysctl consistency
  watchdog: Fix broken nowatchdog logic
  perf: Fix Pentium4 raw event validation
  perf: Fix alloc_callchain_buffers()
2011-02-03 08:52:05 -08:00
Steven Rostedt
3d56e331b6 tracing: Replace syscall_meta_data struct array with pointer array
Currently the syscall_meta structures for the syscall tracepoints are
placed in the __syscall_metadata section, and at link time, the linker
makes one large array of all these syscall metadata structures. On boot
up, this array is read (much like the initcall sections) and the syscall
data is processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the __syscall_metadata section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The __syscall_metadata section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03 09:29:06 -05:00
Mathieu Desnoyers
6549864629 tracepoints: Fix section alignment using pointer array
Make the tracepoints more robust, making them solid enough to handle compiler
changes by not relying on anything based on compiler-specific behavior with
respect to structure alignment. Implement an approach proposed by David Miller:
use an array of const pointers to refer to the individual structures, and export
this pointer array through the linker script rather than the structures per se.
It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
for the pointers), but are less likely to break due to compiler changes.

History:

commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
added the aligned(32) type and variable attribute to the tracepoint structures
to deal with gcc happily aligning statically defined structures on 32-byte
multiples.

One attempt was to use a 8-byte alignment for tracepoint structures by applying
both the variable and type attribute to tracepoint structures definitions and
declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

The reason is that the "aligned" attribute only specify the _minimum_ alignment
for a structure, leaving both the compiler and the linker free to align on
larger multiples. Because tracepoint.c expects the structures to be placed as an
array within each section, up-alignment cause NULL-pointer exceptions due to the
extra unexpected padding.

(this patch applies on top of -tip)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20110126222622.GA10794@Krystal>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03 09:28:46 -05:00
Peter Zijlstra
06c3bc6556 sched: Fix update_curr_rt()
cpu_stopper_thread()
  migration_cpu_stop()
    __migrate_task()
      deactivate_task()
        dequeue_task()
          dequeue_task_rq()
            update_curr_rt()

Will call update_curr_rt() on rq->curr, which at that time is
rq->stop. The problem is that rq->stop.prio matches an RT prio and
thus falsely assumes its a rt_sched_class task.

Reported-Debuged-Tested-Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: stable@kernel.org # .37
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03 12:21:33 +01:00
Peter Zijlstra
542e72fc90 perf: Fix reading in perf_event_read()
It is quite possible for the event to have been disabled between
perf_event_read() sending the IPI and the CPU servicing the IPI and
calling __perf_event_read(), hence revalidate the state.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03 12:15:46 +01:00
Peter Zijlstra
fe4b04fa31 perf: Cure task_oncpu_function_call() races
Oleg reported that on architectures with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
task_oncpu_function_call() can land before perf_event_task_sched_in()
and cause interesting situations for eg. perf_install_in_context().

This patch reworks the task_oncpu_function_call() interface to give a
more usable primitive as well as rework all its users to hopefully be
more obvious as well as remove the races.

While looking at the code I also found a number of races against
perf_event_task_sched_out() which can flip contexts between tasks so
plug those too.

Reported-and-reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03 12:14:43 +01:00
Steven Rostedt
e4a9ea5ee7 tracing: Replace trace_event struct array with pointer array
Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-02 21:37:13 -05:00
Thomas Gleixner
f1a06390d0 genirq: Prevent irq storm on migration
move_native_irq() masks and unmasks the interrupt line
unconditionally, but the interrupt line might be masked due to a
threaded oneshot handler in progress. Unmasking the line in that case
can lead to interrupt storms. Observed on PREEMPT_RT.

Originally-from: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2011-02-02 22:15:08 +01:00
Marcin Slusarz
9ffdc6c37d watchdog: Don't change watchdog state on read of sysctl
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
[ add {}'s to fix a warning ]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-31 13:22:43 +01:00
Marcin Slusarz
397357666d watchdog: Fix sysctl consistency
If it was not possible to enable watchdog for any cpu, switch
watchdog_enabled back to 0, because it's visible via
kernel.watchdog sysctl.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-31 13:22:43 +01:00
Marcin Slusarz
4135038a58 watchdog: Fix broken nowatchdog logic
Passing nowatchdog to kernel disables 2 things: creation of
watchdog threads AND initialization of percpu watchdog_hrtimer.
As hrtimers are initialized only at boot it's not possible to
enable watchdog later - for me all watchdog threads started to
eat 100% of CPU time, but they could just crash.

Additionally, even if these threads would start properly,
watchdog_disable_all_cpus was guarded by no_watchdog check, so
you couldn't disable watchdog.

To fix this, remove no_watchdog variable and use already
existing watchdog_enabled variable.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
[ removed another no_watchdog instance ]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-31 13:22:42 +01:00
Kacper Kornet
aa5bd67dcf Fix prlimit64 for suid/sgid processes
Since check_prlimit_permission always fails in the case of SUID/GUID
processes, such processes are not able to read or set their own limits.
This commit changes this by assuming that process can always read/change
its own limits.

Signed-off-by: Kacper Kornet <kornet@camk.edu.pl>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-31 13:01:27 +10:00
Linus Torvalds
bffb276fff Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages
  sched: Fix/remove redundant cfs_rq checks
  sched: Fix sign under-flows in wake_affine
2011-01-28 06:45:04 +10:00
Eric Dumazet
88d4f0db7f perf: Fix alloc_callchain_buffers()
Commit 927c7a9e92 ("perf: Fix race in callchains") introduced
a mismatch in the sizing of struct callchain_cpus_entries.

nr_cpu_ids must be used instead of num_possible_cpus(), or we
might get out of bound memory accesses on some machines.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@google.com>
CC: stable@kernel.org
LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-27 19:21:50 +01:00
Paul Turner
05ca62c6ca sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages
The delta in clock_task is a more fair attribution of how much time a tg has
been contributing load to the current cpu.

While not really important it also means we're more in sync (by magnitude)
with respect to periodic updates (since __update_curr deltas are clock_task
based).

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044852.007092349@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 12:31:03 +01:00
Paul Turner
b815f1963e sched: Fix/remove redundant cfs_rq checks
Since updates are against an entity's queuing cfs_rq it's not possible to
enter update_cfs_{shares,load} with a NULL cfs_rq.  (Indeed, update_cfs_load
would crash prior to the check if we did anyway since we load is examined
during the initializers).

Also, in the update_cfs_load case there's no point
in maintaining averages for rq->cfs_rq since we don't perform shares
distribution at that level -- NULL check is replaced accordingly.

Thanks to Dan Carpenter for pointing out the deference before NULL check.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.825284940@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 12:31:02 +01:00