Merge reason: This is an older commit under testing that was not pushed yet - merge it.
Also fix up the merge in command-list.txt.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
The ordered sample code allocates singular reference objects struct
sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's
silly. Allocate ~64k sized chunks and hand them out.
Performance gain: ~ 15%
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.398713983@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the sample queue is flushed we free the sample reference objects. Though
we need to malloc new objects when we process further. Stop the malloc/free
orgy and cache the already allocated object for resuage. Only allocate when
the cache is empty.
Performance gain: ~ 10%
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.338488630@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Profiling perf with perf revealed that a large part of the processing time is
spent in malloc/memcpy/free in the sample ordering code. That code copies the
data from the mmap into malloc'ed memory. That's silly. We can keep the mmap
and just store the pointer in the queuing data structure. For 64 bit this is
not a problem as we map the whole file anyway. On 32bit we keep 8 maps around
and unmap the oldest before mmaping the next chunk of the file.
Performance gain: 2.95s -> 1.23s (Faktor 2.4)
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.278787719@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On 64bit we can map the whole file in one go, on 32bit we can at least map
32MB and not map/unmap tiny chunks of the file.
Base the progress bar on 1/16 of the data size.
Preparatory patch to get rid of the malloc/memcpy/free of trace data.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.213687773@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace the pseudo C++ self argument with session and give the mmap related
variables a sensible name. shift is a complete misnomer - it took me several
rounds of cursing to figure out that it's not a shift value.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163820.029687218@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The homebrewn sort algorithm fails to sort in time order. One of the problem
spots is that it fails to deal with equal timestamps correctly.
My first gut reaction was to replace the fancy list with an rbtree, but the
performance is 3 times worse.
Rewrite it so it works.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101130163819.908482530@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This primarily fixes perf-report, which didn't report the correct type
of event if perf-record was called to record one event different from
'cycles':
$ perf record -e instructions true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (~295 samples) ]
$ perf report | head -n1
# Events: 7 cycles
LPU-Reference: <m3mxor6nex.fsf@gmail.com>
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>