Add the PERF_CPUM_SF_FULL_BLOCKS flag to process only sample-data-blocks that
have the block-full-indicator bit set. Sample-data-blocks that are partially
filled are discarded. Use this flag if the sampling buffer is likely to be
shared among perf events that use different sampling modes. In such
environments, flushing sample-data-blocks that are not completely filled, might
cause invalid-data-formats.
Setting PERF_CPUM_SF_FULL_BLOCKS prevents potentially invalid sampling data to
be processed but, in contrast, also discards valid samples in partially filled
sample-data-blocks. Note that sample-data-blocks might not become full for
small sampling frequencies or for workload that is scheduled for tiny intervals.
To sample with the PERF_CPUM_SF_FULL_BLOCKS flag, set the perf->attr.config1
to 0x0004. For example:
perf record -e cpum_sf/config=0xB000,config1=0x0004/
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Also support the diagnostic-sampling function in addition to the basic-sampling
function. Diagnostic-sampling data entries contain hardware model specific
sampling data and additional programs are required to analyze the data.
To deliver diagnostic-sampling, as well, as basis-sampling data entries to user
space, introduce support for sampling "raw data". If this particular perf
sampling type (PERF_SAMPLE_RAW) is used, sampling data entries are copied
to user space. External programs can then analyze these data.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce the perf_exclude_event() function to filter perf samples
according to event->attr.exclude_* settings. During event initialization,
reset event exclude settings that are not supported.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The host-program-parameter (hpp) value of basic sample-data-entries designates
a SIE control block that is set by the LPP instruction in sie64a().
Non-zero values indicate guest samples, a value of zero indicates a host sample.
For perf samples, host and guest samples are distinguished using particular
PERF_MISC_* flags. The perf layer calls perf_misc_flags() to set the flags
based on the pt_regs content. For each sample-data-entry, the cpum_sf PMU
creates a pt_regs structure with the sample-data information. An additional
flag structure is added to easily distinguish between host and guest samples.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The trailer entry contains a timestamp of the time when the sample-data-block
became full. The timestamp specifies a TOD (time-of-day) value in either the
STCK or STCKE format.
Provide a helper function to return the TOD value depending on the setting of
time format indicator.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ensure to reset the sample-data-block full indicator and the overflow counter
at the same time. This must be done atomically because the sampling hardware
is still active while full sample-data-block is processed.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve the sampling buffer allocation and add a function to reallocate and
increase the sampling buffer structure. The number of allocated buffer elements
(sample-data-blocks) are accounted. You can control the minimum and maximum
number these sample-data-blocks through the cpum_sfb_size kernel parameter.
The number hardware sample overflows (if any) are also accounted and stored
per perf event. During the PMU disable/enable calls, the accumulated overflow
counter is analyzed and, if necessary, the sampling buffer is dynamically
increased.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Register a service level handler to report information about available
CPU-Measurement facilities.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce reserve/release functions to share the sampling facility
between perf and oprofile.
Also improve error handling for the sampling facility support in perf.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cpum_cf (counter facility) PMU does not support sampling events.
With cpum_sf (sampling facility), a PMU for sampling CPU cycles is
available.
Make cpum_sf the "default" PMU for PERF_COUNT_HW_CPU_CYCLES sampling
events but use the more precise cpum_cf PMU for non-sampling events.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a perf PMU, "cpum_sf", to support the CPU-Measurement
Sampling Facility. You can control the sampling facility through
this perf PMU interfaces. Perf sampling events are created for
hardware samples.
For details about the CPU-Measurement Sampling Facility, see
"The Load-Program-Parameter and the CPU-Measurement Facilities" (SA23-2260).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide PMU event attributes for supported counters and export their symbolic
names to the sysfs "events" directory.
See the /sys/devices/cpum_cf/events/ directory for a list of available counters.
Note that you might require counter set authorizations for the LPAR to use them.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The return code of the __put_user call to store the rt_sigreturn
system call to the user stack if not properly checked, the err
variable is only checked before to the __put_user. Use an if
statement instead.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the embedded struct cpu from struct pcpu and replace it with a
pointer instead. The struct cpu now gets allocated when a new cpu gets
detected.
The size of the pcpu_devices array (NR_CPUS * sizeof(struct pcpu)) gets
reduced by nearly 120KB.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is less expensive to update control registers 0 and 2 with two
individual stctg/lctlg instructions as with a single one that spans
control register 0, 1 and 2.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The user_enable_single_step() and user_disable_sindle_step() functions
are always called on the inferior, never for the currently active
process. Remove the unnecessary check for the current process and
the update_cr_regs() call from the enable/disable functions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the per cpu ec_mask bit of the receiving cpu is already set there is
no need to send an ipi, since a different cpu has already sent an ipi
and the receiving cpu has not yet executed the external call ipi handler.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With git commit 79c74ecbeb
"s390/time,vdso: convert to the new update_vsyscall interface"
the new update_vsyscall function already does the sum of xtime
and wall_to_monotonic. The old update_vsyscall function only
copied the wall_to_monotonic offset. The vdso code needs to be
modified to take this into consideration.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The code to use the ECTG instruction to calculate the cputime for the
current thread is currently used only for the per-thread CPU-clock
with the clockid -2 (PID=0, VIRT=1). Use the same code for the clockid
CLOCK_THREAD_CPUTIME_ID to speed up the more common clockid as well.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The access-list entry is supposed to have the fetch-only bit set, however
a reserved bit got set instead.
Userspace isn't able to write to the page anyway since the accessed page
has the read-only bit set. So this saves us only for bad surprises in the
future if the reserved bit gets used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Git commit 9e34f2686bb088b211b6cac8772e1f644c6180f8
"s390/mm,tlb: tlb flush on page table upgrade fixup" removed the
exception handler for the asce-type exception. This is incorrect
as the user-copy with MVCOS can cause asce-type exceptions in
the kernel if a user pointer is too large. Those need to be
handled with do_no_context to branch to the fixup in the
user-copy code.
The simplest fix for this problem is to call do_dat_exception for
asce-type excpetions, as there is no vma for the address the code
will handle the exception correctly.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Git commit 4f37a68cda
"s390: Use direct ktime path for s390 clockevent device" makes use
of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta
calculation with ktime_get() in clockevents_program_event and the
get_tod_clock() in s390_next_event. This is based on the assumption
that the difference between the internal ktime and the hardware
clock is reflected in the wall_to_monotonic delta. But this is not
true, the ntp corrections are applied via changes to the tk->mult
multiplier and this is not reflected in wall_to_monotonic.
In theory this could be solved by using the raw monotonic clock
but it is simpler to switch back to the standard clock delta
calculation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Switch to the improved update_vsyscall interface that provides
sub-nanosecond precision for gettimeofday and clock_gettime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit "s390: fix handling of runtime instrumentation psw bit" (5ebf250dab)
changed the behavior of setting the runtime instrumentation psw bit. This
commit restores the original logic:
1. When returning from the signal handler, the runtime instrumentation psw bit
is restored to its saved state.
2. If the runtime instrumentation psw bit is enabled during the signal handler,
it is always turned off when leaving the signal handler. The saved state
is restored as described in 1. That also implies that turning on runtime
instrumentation in the signal handler is only effective while running in the
signal context.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Pull second set of s390 patches from Martin Schwidefsky:
"The handling of the PCI hotplug notifications has been improved, the
zfcp dumper can now detect the HSA size dynamically and the default
install kernel has been changed to the compressed bzImage. And two
bug-fixes for scm and 3720"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: implement hotplug notifications
s390/scm_block: do not hide eadm subchannel dependency
s390/sclp: Consolidate early sclp init calls to sclp_early_detect()
s390/sclp: Move early code from sclp_cmd.c to sclp_early.c
s390/sclp: Determine HSA size dynamically for zfcpdump
s390/sclp: Move declarations for sclp_sdias into separate header file
s390/pci: implement pcibios_remove_bus
s390/pci: improve handling of bus resources
s390/3270: fix missing device_destroy() call
s390/boot: Install bzImage as default kernel image