The ring-buffer benchmark threads run on nice 0 by default, using
up a lot of CPU time and slowing down the system:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1024 root 20 0 0 0 0 D 95.3 0.0 4:01.67 rb_producer
1023 root 20 0 0 0 0 R 93.5 0.0 2:54.33 rb_consumer
21569 mingo 40 0 14852 1048 772 R 3.6 0.1 0:00.05 top
1 root 40 0 4080 928 668 S 0.0 0.0 0:23.98 init
Renice them to +19 to make them less intrusive.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the user has an older version of objcopy, that can not handle
converting local symbols to global and vice versa, then some
functions will not be part of the dynamic function tracer. The current
code in recordmcount.pl will print a warning in this case. Unfortunately,
there exists lots of files that may have this issue with older objcopys
and this will cause a warning for every file compiled with this
issue.
This patch solves this overwhelming output by creating a
.tmp_quiet_recordmcount file on the first instance the warning is
encountered. The warning will not print if this file exists.
The temp file is deleted at the beginning of the compile to ensure that
the warning will happen once again on new compiles (because the issue
is still present).
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the change of the way we process commits. Where a commit only happens
at the outer most level, and that we don't need to worry about
a commit ending after the rb_start_commit() has been called, the code
use to grab the commit page before the tail page to prevent a possible
race. But this race no longer exists with the rb_start_commit()
rb_end_commit() interface.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Disabling interrupts in trace_clock_local takes quite a performance
hit to the recording of traces. Using perf top we see:
------------------------------------------------------------------------------
PerfTop: 244 irqs/sec kernel:100.0% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________
2842.00 - 40.4% : trace_clock_local
1043.00 - 14.8% : rb_reserve_next_event
784.00 - 11.1% : ring_buffer_lock_reserve
600.00 - 8.5% : __rb_reserve_next
579.00 - 8.2% : rb_end_commit
440.00 - 6.3% : ring_buffer_unlock_commit
290.00 - 4.1% : ring_buffer_producer_thread [ring_buffer_benchmark]
155.00 - 2.2% : debug_smp_processor_id
117.00 - 1.7% : trace_recursive_unlock
103.00 - 1.5% : ring_buffer_event_data
28.00 - 0.4% : do_gettimeofday
22.00 - 0.3% : _spin_unlock_irq
14.00 - 0.2% : native_read_tsc
11.00 - 0.2% : getnstimeofday
Where trace_clock_local is 40% of the tracing, and the time for recording
a trace according to ring_buffer_benchmark is 210ns. After converting
the interrupts to preemption disabling we have from perf top:
------------------------------------------------------------------------------
PerfTop: 1084 irqs/sec kernel:99.9% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________
1277.00 - 16.8% : native_read_tsc
1148.00 - 15.1% : rb_reserve_next_event
896.00 - 11.8% : ring_buffer_lock_reserve
688.00 - 9.1% : __rb_reserve_next
664.00 - 8.8% : rb_end_commit
563.00 - 7.4% : ring_buffer_unlock_commit
508.00 - 6.7% : _spin_unlock_irq
365.00 - 4.8% : debug_smp_processor_id
321.00 - 4.2% : trace_clock_local
303.00 - 4.0% : ring_buffer_producer_thread [ring_buffer_benchmark]
273.00 - 3.6% : native_sched_clock
122.00 - 1.6% : trace_recursive_unlock
113.00 - 1.5% : sched_clock
101.00 - 1.3% : ring_buffer_event_data
53.00 - 0.7% : tick_nohz_stop_sched_tick
Where trace_clock_local drops from 40% to only taking 4% of the total time.
The trace time also goes from 210ns down to 179ns (31ns).
I talked with Peter Zijlstra about the impact that sched_clock may have
without having interrupts disabled, and he told me that if a timer interrupt
comes in, sched_clock may report a wrong time.
Balancing a seldom incorrect timestamp with a 15% performance boost, I'll
take the performance boost.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ring_buffer_benchmark does a gettimeofday after every write to the
ring buffer in its measurements. This adds the overhead of the call
to gettimeofday to the measurements and does not give an accurate picture
of the length of time it takes to record a trace.
This was first noticed with perf top:
------------------------------------------------------------------------------
PerfTop: 679 irqs/sec kernel:99.9% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________
1673.00 - 27.8% : trace_clock_local
806.00 - 13.4% : do_gettimeofday
590.00 - 9.8% : rb_reserve_next_event
554.00 - 9.2% : native_read_tsc
431.00 - 7.2% : ring_buffer_lock_reserve
365.00 - 6.1% : __rb_reserve_next
355.00 - 5.9% : rb_end_commit
322.00 - 5.4% : getnstimeofday
268.00 - 4.5% : ring_buffer_unlock_commit
262.00 - 4.4% : ring_buffer_producer_thread [ring_buffer_benchmark]
113.00 - 1.9% : read_tsc
91.00 - 1.5% : debug_smp_processor_id
69.00 - 1.1% : trace_recursive_unlock
66.00 - 1.1% : ring_buffer_event_data
25.00 - 0.4% : _spin_unlock_irq
And the length of each write to the ring buffer measured at 310ns.
This patch adds a new module parameter called "write_interval" which is
defaulted to 50. This is the number of writes performed between
timestamps. After this patch perf top shows:
------------------------------------------------------------------------------
PerfTop: 244 irqs/sec kernel:100.0% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________
2842.00 - 40.4% : trace_clock_local
1043.00 - 14.8% : rb_reserve_next_event
784.00 - 11.1% : ring_buffer_lock_reserve
600.00 - 8.5% : __rb_reserve_next
579.00 - 8.2% : rb_end_commit
440.00 - 6.3% : ring_buffer_unlock_commit
290.00 - 4.1% : ring_buffer_producer_thread [ring_buffer_benchmark]
155.00 - 2.2% : debug_smp_processor_id
117.00 - 1.7% : trace_recursive_unlock
103.00 - 1.5% : ring_buffer_event_data
28.00 - 0.4% : do_gettimeofday
22.00 - 0.3% : _spin_unlock_irq
14.00 - 0.2% : native_read_tsc
11.00 - 0.2% : getnstimeofday
do_gettimeofday dropped from 13% usage to a mere 0.4%! (using the default
50 interval) The measurement for each timestamp went from 310ns to 210ns.
That's 100ns (1/3rd) overhead that the gettimeofday call was introducing.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If a weak function is used as a relocation reference for mcount callers
and that function is overridden, it will cause ftrace to fail at run time.
The current code should prevent a weak function from being used, but if
one is, the code should exit with an error to fail at compile time.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050743.GH30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move all the condition validations into the function update_funcs().
Also update_funcs should not die if $ref_func is undefined for there may be
more than one valid section in an object file.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050703.GG30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a variable to contain the regex needed to find weak functions
in the 'nm' output. This will allow other archs to easily override it.
Also rename the regex variable $nm_regex to $local_regex to be more
descriptive.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050619.GF30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move the mcount section check to the beginning of the objdump read loop.
This makes the code easier to follow since the search for the mcount
section is performed first before the mcount callers are processed.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050523.GE30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current logic to check objcopy's version is incorrect. This patch
fixes the algorithm and disables the use of local functions as a reference
if the objcopy version does not support static to global conversions.
Also remove some usused variables.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050421.GD30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace.c file may reference the mcount function and this may interfere
with the recordmcount.pl processing. To avoid this, the code does not
process the kernel/trace/ftrace.o. But currently the check is against
a relative path. This patch modifies the check to succeed if the path
is an absolute path.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050332.GC30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The number of arguments passed into recordmcount.pl is 10, but the code
checks if only 7 are passed in.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091027065733.GB22032@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The documentation currently says we will use the first function in a section
as a reference. The actual algorithm is: choose the first global function we
meet as a reference. If there is none, choose the first local one.
Change the documentation to be consistent with the code.
Also add several other clarifications.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
LKML-Reference: <20091028050138.GA30758@uhli>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:
trace_seq_printf(s, "");
is perfectly legal, and returning 0 would indicate that it
failed.
We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.
Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.631787612@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
move virtrng_remove to .devexit.text
move virtballoon_remove to .devexit.text
virtio_blk: Revert serial number support
virtio: let header files include virtio_ids.h
virtio_blk: revert QUEUE_FLAG_VIRT addition
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
KS8851: Fix MAC address write order
KS8851: Add soft reset at probe time
net: fix section mismatch in fec.c
net: Fix struct inet_timewait_sock bitfield annotation
tcp: Try to catch MSG_PEEK bug
net: Fix IP_MULTICAST_IF
bluetooth: static lock key fix
bluetooth: scheduling while atomic bug fix
tcp: fix TCP_DEFER_ACCEPT retrans calculation
tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
tcp: accept socket after TCP_DEFER_ACCEPT period
Revert "tcp: fix tcp_defer_accept to consider the timeout"
AF_UNIX: Fix deadlock on connecting to shutdown socket
ethoc: clear only pending irqs
ethoc: inline regs access
vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
be2net: fix support for PCI hot plug
...