linux-packaging-mono/man/mprof-report.1

.TH mprof-report 1 ""
.SH The Mono log profiler
.PP
The Mono \f[I]log\f[] profiler can be used to collect a lot of
information about a program running in the Mono runtime.
This data can be used (both while the process is running and later)
to do analyses of the program behaviour, determine resource usage,
performance issues or even look for particular execution patterns.
.PP
This is accomplished by logging the events provided by the Mono
runtime through the profiling interface and periodically writing
them to a file which can be later inspected with the command line
\f[I]mprof-report\f[] program or with a GUI (not developed yet).
.PP
The events collected include (among others):
.IP \[bu] 2
method enter and leave
.IP \[bu] 2
object allocation
.IP \[bu] 2
garbage collection
.IP \[bu] 2
JIT compilation
.IP \[bu] 2
metadata loading
.IP \[bu] 2
lock contention
.IP \[bu] 2
exceptions
.PP
In addition, the profiler can periodically collect info about all
the objects present in the heap at the end of a garbage collection
(this is called heap shot and currently implemented only for the
sgen garbage collector).
Another available profiler mode is the \f[I]sampling\f[] or
\f[I]statistical\f[] mode: periodically the program is sampled and
the information about what the program was busy with is saved.
This allows to get information about the program behaviour without
degrading its performance too much (usually less than 10%).
.SS Basic profiler usage
.PP
The simpler way to use the profiler is the following:
.PP
\f[B]mono\ --profile=log\ program.exe\f[]
.PP
At the end of the execution the file \f[I]output.mlpd\f[] will be
found in the current directory.
A summary report of the data can be printed by running:
.PP
\f[B]mprof-report\ output.mlpd\f[]
.PP
With this invocation a huge amount of data is collected about the
program execution and collecting and saving this data can
significantly slow down program execution.
If saving the profiling data is not needed, a report can be
generated directly with:
.PP
\f[B]mono\ --profile=log:report\ program.exe\f[]
.PP
If the information about allocations is not of interest, it can be
excluded:
.PP
\f[B]mono\ --profile=log:noalloc\ program.exe\f[]
.PP
On the other hand, if method call timing is not important, while
allocations are, the needed info can be gathered with:
.PP
\f[B]mono\ --profile=log:nocalls\ program.exe\f[]
.PP
You will still be able to inspect information about the sequence of
calls that lead to each allocation because at each object
allocation a stack trace is collected if full enter/leave
information is not available.
.PP
To periodically collect heap shots (and exclude method and
allocation events) use the following options (making sure you run
with the sgen garbage collector):
.PP
\f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
.PP
To perform a sampling profiler run, use the \f[I]sample\f[] option:
.PP
\f[B]mono\ --profile=log:sample\ program.exe\f[]
.SS Profiler option documentation
.PP
By default the \f[I]log\f[] profiler will gather all the events
provided by the Mono runtime and write them to a file named
\f[I]output.mlpd\f[].
When no option is specified, it is equivalent to using:
.PP
\f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100\f[]
.PP
The following options can be used to modify this default behaviour.
Each option is separated from the next by a \f[B],\f[] character,
with no spaces and all the options are included after the
\f[I]log:\f[] profile module specifier.
.IP \[bu] 2
\f[I]help\f[]: display concise help info about each available
option
.IP \[bu] 2
\f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object
allocation info, \f[I]alloc\f[] enables it if it was disabled by
another option like \f[I]heapshot\f[].
.IP \[bu] 2
\f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method
enter and leave events.
When this option is used at each object allocation and at some
other events (like lock contentions and exception throws) a stack
trace is collected by default.
See the \f[I]maxframes\f[] option to control this behaviour.
\f[I]calls\f[] enables method enter/leave events if they were
disabled by another option like \f[I]heapshot\f[].
.IP \[bu] 2
\f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
collection.
The frequency of the heap shots can be changed with the
\f[I]MODE\f[] parameter.
When this option is used allocation events and method enter/leave
events are not recorded by default: if they are needed, they need
to be enabled explicitly.
The optional parameter \f[I]MODE\f[] can modify the default heap
shot frequency.
heapshot can be used multiple times with different modes: in that
case a heap shot is taken if either of the conditions are met.
MODE can be one of:
.RS 2
.IP \[bu] 2
\f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
milliseconds passed since the last one.
.IP \[bu] 2
\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
garbage collections
.IP \[bu] 2
\f[I]ondemand\f[]: perform a heap shot when such a command is sent
to the control port
.RE
.IP \[bu] 2
\f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
program behaviour.
The default is to collect a 1000 times per second the instruction
pointer.
This is equivalent to the value \[lq]cycles/1000\[rq] for
\f[I]TYPE\f[].
On some systems, like with recent Linux kernels, it is possible to
cause the sampling to happen for other events provided by the
performance counters of the cpu.
In this case, \f[I]TYPE\f[] can be one of:
.RS 2
.IP \[bu] 2
\f[I]cycles\f[]: processor cycles
.IP \[bu] 2
\f[I]instr\f[]: executed instructions
.IP \[bu] 2
\f[I]cacherefs\f[]: cache references
.IP \[bu] 2
\f[I]cachemiss\f[]: cache misses
.IP \[bu] 2
\f[I]branches\f[]: executed branches
.IP \[bu] 2
\f[I]branchmiss\f[]: mispredicted branches
.RE
.IP \[bu] 2
\f[I]time=TIMER\f[]: use the TIMER timestamp mode.
TIMER can have the following values:
.RS 2
.IP \[bu] 2
\f[I]fast\f[]: a usually faster but possibly more inaccurate timer
.RE
.IP \[bu] 2
\f[I]maxframes=NUM\f[]: when a stack trace needs to be performed,
collect \f[I]NUM\f[] frames at the most.
The default is 8.
.IP \[bu] 2
\f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
call chain depth is bigger than NUM.
.IP \[bu] 2
\f[I]zip\f[]: automatically compress the output data in gzip
format.
.IP \[bu] 2
\f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
with the current process id and \f[I]%t\f[] with the current date
and time, then do according to \f[I]OUTSPEC\f[]:
.RS 2
.IP \[bu] 2
if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
rest as a program and feed the data to its standard input
.IP \[bu] 2
if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
rest of OUTSPEC as the filename, but force overwrite any existing
file by that name
.IP \[bu] 2
otherwise write the data the the named file: note that is a file by
that name already exists, a warning is issued and profiling is
disabled.
.RE
.IP \[bu] 2
\f[I]report\f[]: the profiling data is sent to mprof-report, which
will print a summary report.
This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
If the \f[I]output\f[] option is specified as well, the report will
be written to the output file instead of the console.
.IP \[bu] 2
\f[I]port=PORT\f[]: specify the tcp/ip port to use for the
listening command server.
Currently not available for windows.
This server is started for example when heapshot=ondemand is used:
it will read commands line by line.
The following commands are available:
.RS 2
.IP \[bu] 2
\f[I]heapshot\f[]: perform a heapshot as soon as possible
.RE
.IP \[bu] 2
\f[I]counters\f[]: sample counters values every 1 second. This allow
a really lightweight way to have insight in some of the runtime key
metrics. Counters displayed in non verbose mode are : Methods from AOT,
Methods JITted using mono JIT, Methods JITted using LLVM, Total time
spent JITting (sec), User Time, System Time, Total Time, Working Set,
Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
5min and 15min).
.RE
.SS Analyzing the profile data
.PP
Currently there is a command line program (\f[I]mprof-report\f[])
to analyze the data produced by the profiler.
This is ran automatically when the \f[I]report\f[] profiler option
is used.
Simply run:
.PP
\f[B]mprof-report\ output.mlpd\f[]
.PP
to see a summary report of the data included in the file.
.SS Trace information for events
.PP
Often it is important for some events, like allocations, lock
contention and exception throws to know where they happened.
Or we may want to see what sequence of calls leads to a particular
method invocation.
To see this info invoke mprof-report as follows:
.PP
\f[B]mprof-report\ --traces\ output.mlpd\f[]
.PP
The maximum number of methods in each stack trace can be specified
with the \f[I]\[em]maxframes=NUM\f[] option:
.PP
\f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[]
.PP
The stack trace info will be available if method enter/leave events
have been recorded or if stack trace collection wasn't explicitly
disabled with the \f[I]maxframes=0\f[] profiler option.
Note that the profiler will collect up to 8 frames by default at
specific events when the \f[I]nocalls\f[] option is used, so in
that case, if more stack frames are required in mprof-report, a
bigger value for maxframes when profiling must be used, too.
.PP
The \f[I]\[em]traces\f[] option also controls the reverse reference
feature in the heapshot report: for each class it reports how many
references to objects of that class come from other classes.
.SS Sort order for methods and allocations
.PP
When a list of methods is printed the default sort order is based
on the total time spent in the method.
This time is wall clock time (that is, it includes the time spent,
for example, in a sleep call, even if actual cpu time would be
basically 0).
Also, if the method has been ran on different threads, the time
will be a sum of the time used in each thread.
.PP
To change the sort order, use the option:
.PP
\f[B]--method-sort=MODE\f[]
.PP
where \f[I]MODE\f[] can be:
.IP \[bu] 2
\f[I]self\f[]: amount of time spent in the method itself and not in
its callees
.IP \[bu] 2
\f[I]calls\f[]: the number of method invocations
.IP \[bu] 2
\f[I]total\f[]: the total time spent in the method.
.PP
Object allocation lists are sorted by default depending on the
total amount of bytes used by each type.
.PP
To change the sort order of object allocations, use the option:
.PP
\f[B]--alloc-sort=MODE\f[]
.PP
where \f[I]MODE\f[] can be:
.IP \[bu] 2
\f[I]count\f[]: the number of allocated objects of the given type
.IP \[bu] 2
\f[I]bytes\f[]: the total number of bytes used by objects of the
given type
.PP
To change the sort order of counters, use the option:
.PP
\f[B]--counters-sort=MODE\f[]
.PP
where \f[I]MODE\f[] can be:
.IP \[bu] 2
\f[I]time\f[]: sort values by time then category
.IP \[bu] 2
\f[I]category\f[]: sort values by category then time
.SS Selecting what data to report
.PP
The profiler by default collects data about many runtime subsystems
and mprof-report prints a summary of all the subsystems that are
found in the data file.
It is possible to tell mprof-report to only show information about
some of them with the following option:
.PP
\f[B]--reports=R1[,R2...]\f[]
.PP
where the report names R1, R2 etc.
can be:
.IP \[bu] 2
\f[I]header\f[]: information about program startup and profiler
version
.IP \[bu] 2
\f[I]jit\f[]: JIT compiler information
.IP \[bu] 2
\f[I]sample\f[]: statistical sampling information
.IP \[bu] 2
\f[I]gc\f[]: garbage collection information
.IP \[bu] 2
\f[I]alloc\f[]: object allocation information
.IP \[bu] 2
\f[I]call\f[]: method profiling information
.IP \[bu] 2
\f[I]metadata\f[]: metadata events like image loads
.IP \[bu] 2
\f[I]exception\f[]: exception throw and handling information
.IP \[bu] 2
\f[I]monitor\f[]: lock contention information
.IP \[bu] 2
\f[I]thread\f[]: thread information
.IP \[bu] 2
\f[I]domain\f[]: app domain information
.IP \[bu] 2
\f[I]context\f[]: remoting context information
.IP \[bu] 2
\f[I]heapshot\f[]: live heap usage at heap shots
.IP \[bu] 2
\f[I]counters\f[]: counters samples
.PP
It is possible to limit some of the data displayed to a timeframe
of the program execution with the option:
.PP
\f[B]--time=FROM-TO\f[]
.PP
where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application
startup (they can be floating point numbers).
.PP
Another interesting option is to consider only events happening on
a particular thread with the following option:
.PP
\f[B]--thread=THREADID\f[]
.PP
where \f[I]THREADID\f[] is one of the numbers listed in the thread
summary report (or a thread name when present).
.PP
By default long lists of methods or other information like object
allocations are limited to the most important data.
To increase the amount of information printed you can use the
option:
.PP
\f[B]--verbose\f[]
.SS Track individual objects
.PP
Instead of printing the usual reports from the profiler data, it is
possible to track some interesting information about some specific
object addresses.
The objects are selected based on their address with the
\f[I]\[em]track\f[] option as follows:
.PP
\f[B]--track=0xaddr1[,0xaddr2,...]\f[]
.PP
The reported info (if available in the data file), will be class
name, size, creation time, stack trace of creation (with the
\f[I]\[em]traces\f[] option), etc.
If heapshot data is available it will be possible to also track
what other objects reference one of the listed addresses.
.PP
The object addresses can be gathered either from the profiler
report in some cases (like in the monitor lock report), from the
live application or they can be selected with the
\f[I]\[em]find=FINDSPEC\f[] option.
FINDSPEC can be one of the following:
.IP \[bu] 2
\f[I]S:SIZE\f[]: where the object is selected if it's size is at
least \f[I]SIZE\f[]
.IP \[bu] 2
\f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[]
partially matches its class name
.PP
This option can be specified multiple times with one of the
different kinds of FINDSPEC.
For example, the following:
.PP
\f[B]--find=S:10000\ --find=T:Byte[]\f[]
.PP
will find all the byte arrays that are at least 10000 bytes in
size.
.PP
Note that with a moving garbage collector the object address can
change, so you may need to track the changed address manually.
It can also happen that multiple objects are allocated at the same
address, so the output from this option can become large.
.SS Saving a profiler report
.PP
By default mprof-report will print the summary data to the console.
To print it to a file, instead, use the option:
.PP
\f[B]--out=FILENAME\f[]
.SS Dealing with profiler slowness
.PP
If the profiler needs to collect lots of data, the execution of the
program will slow down significantly, usually 10 to 20 times
slower.
There are several ways to reduce the impact of the profiler on the
program execution.
.SS Use the statistical sampling mode
.PP
Statistical sampling allows executing a program under the profiler
with minimal performance overhead (usually less than 10%).
This mode allows checking where the program is spending most of
it's execution time without significantly perturbing its behaviour.
.SS Collect less data
.PP
Collecting method enter/leave events can be very expensive,
especially in programs that perform many millions of tiny calls.
The profiler option \f[I]nocalls\f[] can be used to avoid
collecting this data or it can be limited to only a few call levels
with the \f[I]calldepth\f[] option.
.PP
Object allocation information is expensive as well, though much
less than method enter/leave events.
If it's not needed, it can be skipped with the \f[I]noalloc\f[]
profiler option.
Note that when method enter/leave events are discarded, by default
stack traces are collected at each allocation and this can be
expensive as well.
The impact of stack trace information can be reduced by setting a
low value with the \f[I]maxframes\f[] option or by eliminating them
completely, by setting it to 0.
.PP
The other major source of data is the heapshot profiler option:
especially if the managed heap is big, since every object needs to
be inspected.
The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
used to reduce the frequency of the heap shots.
.SS Reduce the timestamp overhead
.PP
On many operating systems or architectures what actually slows down
profiling is the function provided by the system to get timestamp
information.
The \f[I]time=fast\f[] profiler option can be usually used to speed
up this operation, but, depending on the system, time accounting
may have some level of approximation (though statistically the data
should be still fairly valuable).
.SS Dealing with the size of the data files
.PP
When collecting a lot of information about a profiled program, huge
data files can be generated.
There are a few ways to minimize the amount of data, for example by
not collecting some of the more space-consuming information or by
compressing the information on the fly or by just generating a
summary report.
.SS Reducing the amount of data
.PP
Method enter/leave events can be excluded completely with the
\f[I]nocalls\f[] option or they can be limited to just a few levels
of calls with the \f[I]calldepth\f[] option.
For example, the option:
.PP
\f[B]calldepth=10\f[]
.PP
will ignore the method events when there are more than 10 managed
stack frames.
This is very useful for programs that have deep recursion or for
programs that perform many millions of tiny calls deep enough in
the call stack.
The optimal number for the calldepth option depends on the program
and it needs to be balanced between providing enough profiling
information and allowing fast execution speed.
.PP
Note that by default, if method events are not recorded at all, the
profiler will collect stack trace information at events like
allocations.
To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler
option.
.PP
Allocation events can be eliminated with the \f[I]noalloc\f[]
option.
.PP
Heap shot data can also be huge: by default it is collected at each
major collection.
To reduce the frequency, you can specify a heapshot mode: for
example to collect every 5 collections (including major and minor):
.PP
\f[B]heapshot=5gc\f[]
.PP
or when at least 5 seconds passed since the last heap shot:
.PP
\f[B]heapshot=5000ms\f[]
.SS Compressing the data
.PP
To reduce the amout of disk space used by the data, the data can be
compressed either after it has been generated with the gzip
command:
.PP
\f[B]gzip\ -9\ output.mlpd\f[]
.PP
or it can be compressed automatically by using the \f[I]zip\f[]
profiler option.
Note that in this case there could be a significant slowdown of the
profiled program.
.PP
The mprof-report program will tranparently deal with either
compressed or uncompressed data files.
.SS Generating only a summary report
.PP
Often it's enough to look at the profiler summary report to
diagnose an issue and in this case it's possible to avoid saving
the profiler data file to disk.
This can be accomplished with the \f[I]report\f[] profiler option,
which will basically send the data to the mprof-report program for
display.
.PP
To have more control of what summary information is reported (or to
use a completely different program to decode the profiler data),
the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as
the first character: the rest of the output name will be executed
as a program with the data fed in on the standard input.
.PP
For example, to print only the Monitor summary with stack trace
information, you could use it like this:
.PP
\f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
.SH WEB SITE
http://www.mono-project.com/docs/debug+profile/profile/profiler/
.SH SEE ALSO
.PP
mono(1)
.SH AUTHORS
Paolo Molaro.