While most users of the hcall tracepoints will only want the opcode
and return code, some will want all the arguments. To avoid the
complexity of using varargs we pass a pointer to the register save
area, which contains all the arguments.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add hcall_entry and hcall_exit tracepoints. This replaces the inline
assembly HCALL_STATS code and converts it to use the new tracepoints.
To keep the disabled case as quick as possible, we embed a status word
in the TOC so we can get at it with a single load. By doing so we
keep the overhead at a minimum. Time taken for a null hcall:
No tracepoint code: 135.79 cycles
Disabled tracepoints: 137.95 cycles
For reference, before this patch enabling HCALL_STATS resulted in a null
hcall of 201.44 cycles!
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can monitor the effectiveness of our power management of both the
kernel and hypervisor by probing the timer interrupt. For example, on
this box we see 10.37s timer interrupts on an idle core:
<idle>-0 [010] 3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
Since we have a 207MHz decrementer it will go negative and fire every 10.37s
even if Linux is completely idle.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds powerpc-specific tracepoints for interrupt entry and exit.
While we already have generic irq_handler_entry and irq_handler_exit
tracepoints there are cases on our virtualised powerpc machines where an
interrupt is presented to the OS, but subsequently handled by the hypervisor.
This means no OS interrupt handler is invoked.
Here is an example on a POWER6 machine with the patch below applied:
<idle>-0 [006] 3243.949840744: irq_entry: pt_regs=c0000000ce31fb10
<idle>-0 [006] 3243.949850520: irq_exit: pt_regs=c0000000ce31fb10
<idle>-0 [007] 3243.950218208: irq_entry: pt_regs=c0000000ce323b10
<idle>-0 [007] 3243.950224080: irq_exit: pt_regs=c0000000ce323b10
<idle>-0 [000] 3244.021879320: irq_entry: pt_regs=c000000000a63aa0
<idle>-0 [000] 3244.021883616: irq_handler_entry: irq=87 handler=eth0
<idle>-0 [000] 3244.021887328: irq_handler_exit: irq=87 return=handled
<idle>-0 [000] 3244.021897408: irq_exit: pt_regs=c000000000a63aa0
Here we see two phantom interrupts (no handler was invoked), followed
by a real interrupt for eth0. Without the tracepoints in this patch we
would have missed the phantom interrupts.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Hook up the alignment-faults and emulation-faults events for powerpc.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
perf_event wants a separate event for alignment and emulation faults,
so create another emulation event. This will make it easy to hook in
perf_event at one spot.
We pass in regs which will be required for these events.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In continuous sampling mode we want the SDAR to update. While we can
select between dcache misses and ERAT (L1-TLB) misses, a decent default
is to enable both.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Profiling of a page fault scalability microbenchmark shows flush_hash_range
is not calling the batch hpte invalidate hcall (H_BULK_REMOVE).
It turns out we have a duplicate firmware feature for hcall-bulk and the
current setup code stops after finding the first match. This meant we never
batch and always do individual invalidates.
The patch below removes the duplicate and shifts FW_FEATURE_CMO to close
the gap. With the patch applied the single threaded page fault rate improves
from 217169 to 238755 per second on a POWER5 test box, a 10% improvement.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
Fix build of cpm_uart due to core changes
powerpc/8xx: Fix regression introduced by cache coherency rewrite
powerpc/4xx: Fix erroneous xmon warning on PowerPC 4xx
powerpc/mm: Fix 40x and 8xx vs. _PAGE_SPECIAL
powerpc: Cleanup linker script using new linker script macros.
powerpc: Fix ibm,client-architecture-support printout
powerpc: Increase NODES_SHIFT on 64bit from 4 to 8
powerpc/perf_counter: Fix vdso detection
powerpc: Move 64bit heap above 1TB on machines with 1TB segments
powerpc: Change archdata dma_data to a union
powerpc: Rename get_dma_direct_offset get_dma_offset
powerpc/mm: Remove duplicated #include
powerpc/book3e-64: Remove duplicated #include
powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
powerpc/pmc: Don't access lppaca on Book3E
powerpc: kmalloc failure ignored in vio_build_iommu_table()
hvc_console: Provide (un)locked version for hvc_resize()
* 'for-linus' of git://neil.brown.name/md: (97 commits)
md: raid-1/10: fix RW bits manipulation
md: remove unnecessary memset from multipath.
md: report device as congested when suspended
md: Improve name of threads created by md_register_thread
md: remove sparse warnings about lock context.
md: remove sparse waring "symbol xxx shadows an earlier one"
async_tx/raid6: add missing dma_unmap calls to the async fail case
ioat3: fix uninitialized var warnings
drivers/dma/ioat/dma_v2.c: fix warnings
raid6test: fix stack overflow
ioat2: clarify ring size limits
md/raid6: cleanup ops_run_compute6_2
md/raid6: eliminate BUG_ON with side effect
dca: module load should not be an error message
ioat: driver version 4.0
dca: registering requesters in multiple dca domains
async_tx: remove HIGHMEM64G restriction
dmaengine: sh: Add Support SuperH DMA Engine driver
dmaengine: Move all map_sg/unmap_sg for slave channel to its client
fsldma: Add DMA_SLAVE support
...
The test to check whether we have _PAGE_SPECIAL defined is broken,
since we always define it, just not always to a meaninful value :-)
That broke 8xx and 40x under some circumstances.
This fixes it by adding _PAGE_SPECIAL for both of these since they
had a free PTE bit, and removing the condition around advertising
it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Sometimes this is used to hold a simple offset, and sometimes
it is used to hold a pointer. This patch changes it to a union containing
void * and dma_addr_t. get/set accessors are also provided, because it was
getting a bit ugly to get to the actual data.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The former is no longer really accurate with the swiotlb case now
a possibility. I also move it into dma-mapping.h - it no longer
needs to be in dma.c, and there are about to be some more accessors
that should all end up in the same place. A comment is added to
indicate that this function is not used in configs where there is no
simple dma offset, such as the iommu case.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_event, powerpc: Fix compilation after big perf_counter rename
Add a flag for mmap that will be used to request a huge page region that
will look like anonymous memory to user space. This is accomplished by
using a file on the internal vfsmount. MAP_HUGETLB is a modifier of
MAP_ANONYMOUS and so must be specified with it. The region will behave
the same as a MAP_ANONYMOUS region using small pages.
The patch also adds the MAP_STACK flag, which was previously defined only
on some architectures but not on others. Since MAP_STACK is meant to be a
hint only, architectures can define it without assigning a specific
meaning to it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes two places in the powerpc perf_event (perf_counter) code
where 'list_entry' needs to be changed to 'group_entry', but were
missed in commit 65abc865 ("perf_counter: Rename list_entry ->
group_entry, counter_list -> group_list").
This also changes 'event' back to 'counter' in a couple of
contexts:
* Field and function names that deal with the limited-function
counters: it's really the hardware counters whose function is
limited, not the events that they count. Hence:
MAX_LIMITED_HWEVENTS -> MAX_LIMITED_HWCOUNTERS
limited_event -> limited_counter
freeze/thaw_limited_events -> freeze/thaw_limited_counters
* The machine-specific PMU description struct (struct power_pmu): this
renames 'n_event' back to 'n_counter' since it really describes how
many hardware counters the machine has. (Renaming this back avoids
a compile error in each of the machine-specific PMU back-ends where
they initialize their power_pmu struct.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19128.4280.813369.589704@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>