6992685b86
Former-commit-id: 0a113cb3a6feb7873f632839b1307cc6033cd595
550 lines
20 KiB
Groff
550 lines
20 KiB
Groff
.TH mprof-report 1 ""
|
|
.SH The Mono log profiler
|
|
.PP
|
|
The Mono \f[I]log\f[] profiler can be used to collect a lot of
|
|
information about a program running in the Mono runtime.
|
|
This data can be used (both while the process is running and later)
|
|
to do analyses of the program behaviour, determine resource usage,
|
|
performance issues or even look for particular execution patterns.
|
|
.PP
|
|
This is accomplished by logging the events provided by the Mono
|
|
runtime through the profiling interface and periodically writing
|
|
them to a file which can be later inspected with the command line
|
|
\f[I]mprof-report\f[] program or with a GUI (not developed yet).
|
|
.PP
|
|
The events collected include (among others):
|
|
.IP \[bu] 2
|
|
method enter and leave
|
|
.IP \[bu] 2
|
|
object allocation
|
|
.IP \[bu] 2
|
|
garbage collection
|
|
.IP \[bu] 2
|
|
JIT compilation
|
|
.IP \[bu] 2
|
|
metadata loading
|
|
.IP \[bu] 2
|
|
lock contention
|
|
.IP \[bu] 2
|
|
exceptions
|
|
.PP
|
|
In addition, the profiler can periodically collect info about all
|
|
the objects present in the heap at the end of a garbage collection
|
|
(this is called heap shot and currently implemented only for the
|
|
sgen garbage collector).
|
|
Another available profiler mode is the \f[I]sampling\f[] or
|
|
\f[I]statistical\f[] mode: periodically the program is sampled and
|
|
the information about what the program was busy with is saved.
|
|
This allows to get information about the program behaviour without
|
|
degrading its performance too much (usually less than 10%).
|
|
.SS Basic profiler usage
|
|
.PP
|
|
The simpler way to use the profiler is the following:
|
|
.PP
|
|
\f[B]mono\ --profile=log\ program.exe\f[]
|
|
.PP
|
|
At the end of the execution the file \f[I]output.mlpd\f[] will be
|
|
found in the current directory.
|
|
A summary report of the data can be printed by running:
|
|
.PP
|
|
\f[B]mprof-report\ output.mlpd\f[]
|
|
.PP
|
|
With this invocation a huge amount of data is collected about the
|
|
program execution and collecting and saving this data can
|
|
significantly slow down program execution.
|
|
If saving the profiling data is not needed, a report can be
|
|
generated directly with:
|
|
.PP
|
|
\f[B]mono\ --profile=log:report\ program.exe\f[]
|
|
.PP
|
|
If the information about allocations is not of interest, it can be
|
|
excluded:
|
|
.PP
|
|
\f[B]mono\ --profile=log:noalloc\ program.exe\f[]
|
|
.PP
|
|
On the other hand, if method call timing is not important, while
|
|
allocations are, the needed info can be gathered with:
|
|
.PP
|
|
\f[B]mono\ --profile=log:nocalls\ program.exe\f[]
|
|
.PP
|
|
You will still be able to inspect information about the sequence of
|
|
calls that lead to each allocation because at each object
|
|
allocation a stack trace is collected if full enter/leave
|
|
information is not available.
|
|
.PP
|
|
To periodically collect heap shots (and exclude method and
|
|
allocation events) use the following options (making sure you run
|
|
with the sgen garbage collector):
|
|
.PP
|
|
\f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
|
|
.PP
|
|
To perform a sampling profiler run, use the \f[I]sample\f[] option:
|
|
.PP
|
|
\f[B]mono\ --profile=log:sample\ program.exe\f[]
|
|
.SS Profiler option documentation
|
|
.PP
|
|
By default the \f[I]log\f[] profiler will gather all the events
|
|
provided by the Mono runtime and write them to a file named
|
|
\f[I]output.mlpd\f[].
|
|
When no option is specified, it is equivalent to using:
|
|
.PP
|
|
\f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100\f[]
|
|
.PP
|
|
The following options can be used to modify this default behaviour.
|
|
Each option is separated from the next by a \f[B],\f[] character,
|
|
with no spaces and all the options are included after the
|
|
\f[I]log:\f[] profile module specifier.
|
|
.IP \[bu] 2
|
|
\f[I]help\f[]: display concise help info about each available
|
|
option
|
|
.IP \[bu] 2
|
|
\f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object
|
|
allocation info, \f[I]alloc\f[] enables it if it was disabled by
|
|
another option like \f[I]heapshot\f[].
|
|
.IP \[bu] 2
|
|
\f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method
|
|
enter and leave events.
|
|
When this option is used at each object allocation and at some
|
|
other events (like lock contentions and exception throws) a stack
|
|
trace is collected by default.
|
|
See the \f[I]maxframes\f[] option to control this behaviour.
|
|
\f[I]calls\f[] enables method enter/leave events if they were
|
|
disabled by another option like \f[I]heapshot\f[].
|
|
.IP \[bu] 2
|
|
\f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
|
|
collection.
|
|
The frequency of the heap shots can be changed with the
|
|
\f[I]MODE\f[] parameter.
|
|
When this option is used allocation events and method enter/leave
|
|
events are not recorded by default: if they are needed, they need
|
|
to be enabled explicitly.
|
|
The optional parameter \f[I]MODE\f[] can modify the default heap
|
|
shot frequency.
|
|
heapshot can be used multiple times with different modes: in that
|
|
case a heap shot is taken if either of the conditions are met.
|
|
MODE can be one of:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
|
|
milliseconds passed since the last one.
|
|
.IP \[bu] 2
|
|
\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
|
|
garbage collections
|
|
.IP \[bu] 2
|
|
\f[I]ondemand\f[]: perform a heap shot when such a command is sent
|
|
to the control port
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
|
|
program behaviour.
|
|
The default is to collect a 1000 times per second the instruction
|
|
pointer.
|
|
This is equivalent to the value \[lq]cycles/1000\[rq] for
|
|
\f[I]TYPE\f[].
|
|
On some systems, like with recent Linux kernels, it is possible to
|
|
cause the sampling to happen for other events provided by the
|
|
performance counters of the cpu.
|
|
In this case, \f[I]TYPE\f[] can be one of:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[I]cycles\f[]: processor cycles
|
|
.IP \[bu] 2
|
|
\f[I]instr\f[]: executed instructions
|
|
.IP \[bu] 2
|
|
\f[I]cacherefs\f[]: cache references
|
|
.IP \[bu] 2
|
|
\f[I]cachemiss\f[]: cache misses
|
|
.IP \[bu] 2
|
|
\f[I]branches\f[]: executed branches
|
|
.IP \[bu] 2
|
|
\f[I]branchmiss\f[]: mispredicted branches
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[I]time=TIMER\f[]: use the TIMER timestamp mode.
|
|
TIMER can have the following values:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[I]fast\f[]: a usually faster but possibly more inaccurate timer
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[I]maxframes=NUM\f[]: when a stack trace needs to be performed,
|
|
collect \f[I]NUM\f[] frames at the most.
|
|
The default is 8.
|
|
.IP \[bu] 2
|
|
\f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
|
|
call chain depth is bigger than NUM.
|
|
.IP \[bu] 2
|
|
\f[I]zip\f[]: automatically compress the output data in gzip
|
|
format.
|
|
.IP \[bu] 2
|
|
\f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
|
|
the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
|
|
with the current process id and \f[I]%t\f[] with the current date
|
|
and time, then do according to \f[I]OUTSPEC\f[]:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
|
|
rest as a program and feed the data to its standard input
|
|
.IP \[bu] 2
|
|
if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
|
|
rest of OUTSPEC as the filename, but force overwrite any existing
|
|
file by that name
|
|
.IP \[bu] 2
|
|
otherwise write the data the the named file: note that is a file by
|
|
that name already exists, a warning is issued and profiling is
|
|
disabled.
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[I]report\f[]: the profiling data is sent to mprof-report, which
|
|
will print a summary report.
|
|
This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
|
|
If the \f[I]output\f[] option is specified as well, the report will
|
|
be written to the output file instead of the console.
|
|
.IP \[bu] 2
|
|
\f[I]port=PORT\f[]: specify the tcp/ip port to use for the
|
|
listening command server.
|
|
Currently not available for windows.
|
|
This server is started for example when heapshot=ondemand is used:
|
|
it will read commands line by line.
|
|
The following commands are available:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[I]heapshot\f[]: perform a heapshot as soon as possible
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[I]counters\f[]: sample counters values every 1 second. This allow
|
|
a really lightweight way to have insight in some of the runtime key
|
|
metrics. Counters displayed in non verbose mode are : Methods from AOT,
|
|
Methods JITted using mono JIT, Methods JITted using LLVM, Total time
|
|
spent JITting (sec), User Time, System Time, Total Time, Working Set,
|
|
Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
|
|
5min and 15min).
|
|
.RE
|
|
.SS Analyzing the profile data
|
|
.PP
|
|
Currently there is a command line program (\f[I]mprof-report\f[])
|
|
to analyze the data produced by the profiler.
|
|
This is ran automatically when the \f[I]report\f[] profiler option
|
|
is used.
|
|
Simply run:
|
|
.PP
|
|
\f[B]mprof-report\ output.mlpd\f[]
|
|
.PP
|
|
to see a summary report of the data included in the file.
|
|
.SS Trace information for events
|
|
.PP
|
|
Often it is important for some events, like allocations, lock
|
|
contention and exception throws to know where they happened.
|
|
Or we may want to see what sequence of calls leads to a particular
|
|
method invocation.
|
|
To see this info invoke mprof-report as follows:
|
|
.PP
|
|
\f[B]mprof-report\ --traces\ output.mlpd\f[]
|
|
.PP
|
|
The maximum number of methods in each stack trace can be specified
|
|
with the \f[I]\[em]maxframes=NUM\f[] option:
|
|
.PP
|
|
\f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[]
|
|
.PP
|
|
The stack trace info will be available if method enter/leave events
|
|
have been recorded or if stack trace collection wasn't explicitly
|
|
disabled with the \f[I]maxframes=0\f[] profiler option.
|
|
Note that the profiler will collect up to 8 frames by default at
|
|
specific events when the \f[I]nocalls\f[] option is used, so in
|
|
that case, if more stack frames are required in mprof-report, a
|
|
bigger value for maxframes when profiling must be used, too.
|
|
.PP
|
|
The \f[I]\[em]traces\f[] option also controls the reverse reference
|
|
feature in the heapshot report: for each class it reports how many
|
|
references to objects of that class come from other classes.
|
|
.SS Sort order for methods and allocations
|
|
.PP
|
|
When a list of methods is printed the default sort order is based
|
|
on the total time spent in the method.
|
|
This time is wall clock time (that is, it includes the time spent,
|
|
for example, in a sleep call, even if actual cpu time would be
|
|
basically 0).
|
|
Also, if the method has been ran on different threads, the time
|
|
will be a sum of the time used in each thread.
|
|
.PP
|
|
To change the sort order, use the option:
|
|
.PP
|
|
\f[B]--method-sort=MODE\f[]
|
|
.PP
|
|
where \f[I]MODE\f[] can be:
|
|
.IP \[bu] 2
|
|
\f[I]self\f[]: amount of time spent in the method itself and not in
|
|
its callees
|
|
.IP \[bu] 2
|
|
\f[I]calls\f[]: the number of method invocations
|
|
.IP \[bu] 2
|
|
\f[I]total\f[]: the total time spent in the method.
|
|
.PP
|
|
Object allocation lists are sorted by default depending on the
|
|
total amount of bytes used by each type.
|
|
.PP
|
|
To change the sort order of object allocations, use the option:
|
|
.PP
|
|
\f[B]--alloc-sort=MODE\f[]
|
|
.PP
|
|
where \f[I]MODE\f[] can be:
|
|
.IP \[bu] 2
|
|
\f[I]count\f[]: the number of allocated objects of the given type
|
|
.IP \[bu] 2
|
|
\f[I]bytes\f[]: the total number of bytes used by objects of the
|
|
given type
|
|
.PP
|
|
To change the sort order of counters, use the option:
|
|
.PP
|
|
\f[B]--counters-sort=MODE\f[]
|
|
.PP
|
|
where \f[I]MODE\f[] can be:
|
|
.IP \[bu] 2
|
|
\f[I]time\f[]: sort values by time then category
|
|
.IP \[bu] 2
|
|
\f[I]category\f[]: sort values by category then time
|
|
.SS Selecting what data to report
|
|
.PP
|
|
The profiler by default collects data about many runtime subsystems
|
|
and mprof-report prints a summary of all the subsystems that are
|
|
found in the data file.
|
|
It is possible to tell mprof-report to only show information about
|
|
some of them with the following option:
|
|
.PP
|
|
\f[B]--reports=R1[,R2...]\f[]
|
|
.PP
|
|
where the report names R1, R2 etc.
|
|
can be:
|
|
.IP \[bu] 2
|
|
\f[I]header\f[]: information about program startup and profiler
|
|
version
|
|
.IP \[bu] 2
|
|
\f[I]jit\f[]: JIT compiler information
|
|
.IP \[bu] 2
|
|
\f[I]sample\f[]: statistical sampling information
|
|
.IP \[bu] 2
|
|
\f[I]gc\f[]: garbage collection information
|
|
.IP \[bu] 2
|
|
\f[I]alloc\f[]: object allocation information
|
|
.IP \[bu] 2
|
|
\f[I]call\f[]: method profiling information
|
|
.IP \[bu] 2
|
|
\f[I]metadata\f[]: metadata events like image loads
|
|
.IP \[bu] 2
|
|
\f[I]exception\f[]: exception throw and handling information
|
|
.IP \[bu] 2
|
|
\f[I]monitor\f[]: lock contention information
|
|
.IP \[bu] 2
|
|
\f[I]thread\f[]: thread information
|
|
.IP \[bu] 2
|
|
\f[I]domain\f[]: app domain information
|
|
.IP \[bu] 2
|
|
\f[I]context\f[]: remoting context information
|
|
.IP \[bu] 2
|
|
\f[I]heapshot\f[]: live heap usage at heap shots
|
|
.IP \[bu] 2
|
|
\f[I]counters\f[]: counters samples
|
|
.PP
|
|
It is possible to limit some of the data displayed to a timeframe
|
|
of the program execution with the option:
|
|
.PP
|
|
\f[B]--time=FROM-TO\f[]
|
|
.PP
|
|
where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application
|
|
startup (they can be floating point numbers).
|
|
.PP
|
|
Another interesting option is to consider only events happening on
|
|
a particular thread with the following option:
|
|
.PP
|
|
\f[B]--thread=THREADID\f[]
|
|
.PP
|
|
where \f[I]THREADID\f[] is one of the numbers listed in the thread
|
|
summary report (or a thread name when present).
|
|
.PP
|
|
By default long lists of methods or other information like object
|
|
allocations are limited to the most important data.
|
|
To increase the amount of information printed you can use the
|
|
option:
|
|
.PP
|
|
\f[B]--verbose\f[]
|
|
.SS Track individual objects
|
|
.PP
|
|
Instead of printing the usual reports from the profiler data, it is
|
|
possible to track some interesting information about some specific
|
|
object addresses.
|
|
The objects are selected based on their address with the
|
|
\f[I]\[em]track\f[] option as follows:
|
|
.PP
|
|
\f[B]--track=0xaddr1[,0xaddr2,...]\f[]
|
|
.PP
|
|
The reported info (if available in the data file), will be class
|
|
name, size, creation time, stack trace of creation (with the
|
|
\f[I]\[em]traces\f[] option), etc.
|
|
If heapshot data is available it will be possible to also track
|
|
what other objects reference one of the listed addresses.
|
|
.PP
|
|
The object addresses can be gathered either from the profiler
|
|
report in some cases (like in the monitor lock report), from the
|
|
live application or they can be selected with the
|
|
\f[I]\[em]find=FINDSPEC\f[] option.
|
|
FINDSPEC can be one of the following:
|
|
.IP \[bu] 2
|
|
\f[I]S:SIZE\f[]: where the object is selected if it's size is at
|
|
least \f[I]SIZE\f[]
|
|
.IP \[bu] 2
|
|
\f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[]
|
|
partially matches its class name
|
|
.PP
|
|
This option can be specified multiple times with one of the
|
|
different kinds of FINDSPEC.
|
|
For example, the following:
|
|
.PP
|
|
\f[B]--find=S:10000\ --find=T:Byte[]\f[]
|
|
.PP
|
|
will find all the byte arrays that are at least 10000 bytes in
|
|
size.
|
|
.PP
|
|
Note that with a moving garbage collector the object address can
|
|
change, so you may need to track the changed address manually.
|
|
It can also happen that multiple objects are allocated at the same
|
|
address, so the output from this option can become large.
|
|
.SS Saving a profiler report
|
|
.PP
|
|
By default mprof-report will print the summary data to the console.
|
|
To print it to a file, instead, use the option:
|
|
.PP
|
|
\f[B]--out=FILENAME\f[]
|
|
.SS Dealing with profiler slowness
|
|
.PP
|
|
If the profiler needs to collect lots of data, the execution of the
|
|
program will slow down significantly, usually 10 to 20 times
|
|
slower.
|
|
There are several ways to reduce the impact of the profiler on the
|
|
program execution.
|
|
.SS Use the statistical sampling mode
|
|
.PP
|
|
Statistical sampling allows executing a program under the profiler
|
|
with minimal performance overhead (usually less than 10%).
|
|
This mode allows checking where the program is spending most of
|
|
it's execution time without significantly perturbing its behaviour.
|
|
.SS Collect less data
|
|
.PP
|
|
Collecting method enter/leave events can be very expensive,
|
|
especially in programs that perform many millions of tiny calls.
|
|
The profiler option \f[I]nocalls\f[] can be used to avoid
|
|
collecting this data or it can be limited to only a few call levels
|
|
with the \f[I]calldepth\f[] option.
|
|
.PP
|
|
Object allocation information is expensive as well, though much
|
|
less than method enter/leave events.
|
|
If it's not needed, it can be skipped with the \f[I]noalloc\f[]
|
|
profiler option.
|
|
Note that when method enter/leave events are discarded, by default
|
|
stack traces are collected at each allocation and this can be
|
|
expensive as well.
|
|
The impact of stack trace information can be reduced by setting a
|
|
low value with the \f[I]maxframes\f[] option or by eliminating them
|
|
completely, by setting it to 0.
|
|
.PP
|
|
The other major source of data is the heapshot profiler option:
|
|
especially if the managed heap is big, since every object needs to
|
|
be inspected.
|
|
The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
|
|
used to reduce the frequency of the heap shots.
|
|
.SS Reduce the timestamp overhead
|
|
.PP
|
|
On many operating systems or architectures what actually slows down
|
|
profiling is the function provided by the system to get timestamp
|
|
information.
|
|
The \f[I]time=fast\f[] profiler option can be usually used to speed
|
|
up this operation, but, depending on the system, time accounting
|
|
may have some level of approximation (though statistically the data
|
|
should be still fairly valuable).
|
|
.SS Dealing with the size of the data files
|
|
.PP
|
|
When collecting a lot of information about a profiled program, huge
|
|
data files can be generated.
|
|
There are a few ways to minimize the amount of data, for example by
|
|
not collecting some of the more space-consuming information or by
|
|
compressing the information on the fly or by just generating a
|
|
summary report.
|
|
.SS Reducing the amount of data
|
|
.PP
|
|
Method enter/leave events can be excluded completely with the
|
|
\f[I]nocalls\f[] option or they can be limited to just a few levels
|
|
of calls with the \f[I]calldepth\f[] option.
|
|
For example, the option:
|
|
.PP
|
|
\f[B]calldepth=10\f[]
|
|
.PP
|
|
will ignore the method events when there are more than 10 managed
|
|
stack frames.
|
|
This is very useful for programs that have deep recursion or for
|
|
programs that perform many millions of tiny calls deep enough in
|
|
the call stack.
|
|
The optimal number for the calldepth option depends on the program
|
|
and it needs to be balanced between providing enough profiling
|
|
information and allowing fast execution speed.
|
|
.PP
|
|
Note that by default, if method events are not recorded at all, the
|
|
profiler will collect stack trace information at events like
|
|
allocations.
|
|
To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler
|
|
option.
|
|
.PP
|
|
Allocation events can be eliminated with the \f[I]noalloc\f[]
|
|
option.
|
|
.PP
|
|
Heap shot data can also be huge: by default it is collected at each
|
|
major collection.
|
|
To reduce the frequency, you can specify a heapshot mode: for
|
|
example to collect every 5 collections (including major and minor):
|
|
.PP
|
|
\f[B]heapshot=5gc\f[]
|
|
.PP
|
|
or when at least 5 seconds passed since the last heap shot:
|
|
.PP
|
|
\f[B]heapshot=5000ms\f[]
|
|
.SS Compressing the data
|
|
.PP
|
|
To reduce the amout of disk space used by the data, the data can be
|
|
compressed either after it has been generated with the gzip
|
|
command:
|
|
.PP
|
|
\f[B]gzip\ -9\ output.mlpd\f[]
|
|
.PP
|
|
or it can be compressed automatically by using the \f[I]zip\f[]
|
|
profiler option.
|
|
Note that in this case there could be a significant slowdown of the
|
|
profiled program.
|
|
.PP
|
|
The mprof-report program will tranparently deal with either
|
|
compressed or uncompressed data files.
|
|
.SS Generating only a summary report
|
|
.PP
|
|
Often it's enough to look at the profiler summary report to
|
|
diagnose an issue and in this case it's possible to avoid saving
|
|
the profiler data file to disk.
|
|
This can be accomplished with the \f[I]report\f[] profiler option,
|
|
which will basically send the data to the mprof-report program for
|
|
display.
|
|
.PP
|
|
To have more control of what summary information is reported (or to
|
|
use a completely different program to decode the profiler data),
|
|
the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as
|
|
the first character: the rest of the output name will be executed
|
|
as a program with the data fed in on the standard input.
|
|
.PP
|
|
For example, to print only the Monitor summary with stack trace
|
|
information, you could use it like this:
|
|
.PP
|
|
\f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
|
|
.SH WEB SITE
|
|
http://www.mono-project.com/docs/debug+profile/profile/profiler/
|
|
.SH SEE ALSO
|
|
.PP
|
|
mono(1)
|
|
.SH AUTHORS
|
|
Paolo Molaro.
|
|
|