You've already forked linux-rockchip
mirror of
https://github.com/armbian/linux-rockchip.git
synced 2026-01-06 11:08:10 -08:00
Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"New features:
perf ftrace:
- Add -n/--use-nsec option to the 'latency' subcommand.
Default: usecs:
$ sudo perf ftrace latency -T dput -a sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 2098375 | ############################# |
1 - 2 us | 61 | |
2 - 4 us | 33 | |
4 - 8 us | 13 | |
8 - 16 us | 124 | |
16 - 32 us | 123 | |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 1 | |
256 - 512 us | 0 | |
Better granularity with nsec:
$ sudo perf ftrace latency -T dput -a -n sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 0 | |
1 - 2 ns | 0 | |
2 - 4 ns | 0 | |
4 - 8 ns | 0 | |
8 - 16 ns | 0 | |
16 - 32 ns | 0 | |
32 - 64 ns | 0 | |
64 - 128 ns | 1163434 | ############## |
128 - 256 ns | 914102 | ############# |
256 - 512 ns | 884 | |
512 - 1024 ns | 613 | |
1 - 2 us | 31 | |
2 - 4 us | 17 | |
4 - 8 us | 7 | |
8 - 16 us | 123 | |
16 - 32 us | 83 | |
perf lock:
- Add -c/--combine-locks option to merge lock instances in the same
class into a single entry.
# perf lock report -c
Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
rcu_read_lock 251225 0 0 0 0 0
hrtimer_bases.lock 39450 0 0 0 0 0
&sb->s_type->i_l... 10301 1 662 662 662 662
ptlock_ptr(page) 10173 2 701 1402 760 642
&(ei->i_block_re... 8732 0 0 0 0 0
&xa->xa_lock 8088 0 0 0 0 0
&base->lock 6705 0 0 0 0 0
&p->pi_lock 5549 0 0 0 0 0
&dentry->d_lockr... 5010 4 1274 5097 1844 789
&ep->lock 3958 0 0 0 0 0
- Add -F/--field option to customize the list of fields to output:
$ perf lock report -F contended,wait_max -k avg_wait
Name contended max wait(ns) avg wait(ns)
slock-AF_INET6 1 23543 23543
&lruvec->lru_lock 5 18317 11254
slock-AF_INET6 1 10379 10379
rcu_node_1 1 2104 2104
&dentry->d_lockr... 1 1844 1844
&dentry->d_lockr... 1 1672 1672
&newf->file_lock 15 2279 1025
&dentry->d_lockr... 1 792 792
- Add --synth=no option for record, as there is no need to symbolize,
lock names comes from the tracepoints.
perf record:
- Threaded recording, opt-in, via the new --threads command line
option.
- Improve AMD IBS (Instruction-Based Sampling) error handling
messages.
perf script:
- Add 'brstackinsnlen' field (use it with -F) for branch stacks.
- Output branch sample type in 'perf script'.
perf report:
- Add "addr_from" and "addr_to" sort dimensions.
- Print branch stack entry type in 'perf report --dump-raw-trace'
- Fix symbolization for chrooted workloads.
Hardware tracing:
Intel PT:
- Add CFE (Control Flow Event) and EVD (Event Data) packets support.
- Add MODE.Exec IFLAG bit support.
Explanation about these features from the "Intel® 64 and IA-32
architectures software developer’s manual combined volumes: 1, 2A,
2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:
https://cdrdv2.intel.com/v1/dl/getContent/671200
At page 3951:
"32.2.4
Event Trace is a capability that exposes details about the
asynchronous events, when they are generated, and when their
corresponding software event handler completes execution. These
include:
o Interrupts, including NMI and SMI, including the interrupt
vector when defined.
o Faults, exceptions including the fault vector.
- Page faults additionally include the page fault address,
when in context.
o Event handler returns, including IRET and RSM.
o VM exits and VM entries.¹
- VM exits include the values written to the “exit reason”
and “exit qualification” VMCS fields. INIT and SIPI events.
o TSX aborts, including the abort status returned for the RTM
instructions.
o Shutdown.
Additionally, it provides indication of the status of the
Interrupt Flag (IF), to indicate when interrupts are masked"
ARM CoreSight:
- Use advertised caps/min_interval as default sample_period on ARM
spe.
- Update deduction of TRCCONFIGR register for branch broadcast on
ARM's CoreSight ETM.
Vendor Events (JSON):
Intel:
- Update events and metrics for: Alderlake, Broadwell, Broadwell DE,
BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont,
GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX,
Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP,
Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
Tigerlake, TremontX, Westmere EP-SP, and Westmere EX.
ARM:
- Add support for HiSilicon CPA PMU aliasing.
perf stat:
- Fix forked applications enablement of counters.
- The 'slots' should only be printed on a different order than the
one specified on the command line when 'topdown' events are
present, fix it.
Miscellaneous:
- Sync msr-index, cpufeatures header files with the kernel sources.
- Stop using some deprecated libbpf APIs in 'perf trace'.
- Fix some spelling mistakes.
- Refactor the maps pointers usage to pave the way for using refcount
debugging.
- Only offer the --tui option on perf top, report and annotate when
perf was built with libslang.
- Don't mention --to-ctf in 'perf data --help' when not linking with
the required library, libbabeltrace.
- Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by
array_size.cocci.
- Enhance the matching of sub-commands abbreviations:
'perf c2c rec' -> 'perf c2c record'
'perf c2c recport -> error
- Set build-id using build-id header on new mmap records.
- Fix generation of 'perf --version' string.
perf test:
- Add test for the arm_spe event.
- Add test to check unwinding using fame-pointer (fp) mode on arm64.
- Make metric testing more robust in 'perf test'.
- Add error message for unsupported branch stack cases.
libperf:
- Add API for allocating new thread map array.
- Fix typo in perf_evlist__open() failure error messages in libperf
tests.
perf c2c:
- Replace bitmap_weight() with bitmap_empty() where appropriate"
* tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits)
perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
perf tools: Enhance the matching of sub-commands abbreviations
libperf tests: Fix typo in perf_evlist__open() failure error messages
tools arm64: Import cputype.h
perf lock: Add -F/--field option to control output
perf lock: Extend struct lock_key to have print function
perf lock: Add --synth=no option for record
tools headers cpufeatures: Sync with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
perf stat: Fix forked applications enablement of counters
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf evsel: Make evsel__env() always return a valid env
perf build-id: Fix spelling mistake "Cant" -> "Can't"
perf header: Fix spelling mistake "could't" -> "couldn't"
perf script: Add 'brstackinsnlen' for branch stacks
perf parse-events: Move slots only with topdown
perf ftrace latency: Update documentation
perf ftrace latency: Add -n/--use-nsec option
perf tools: Fix version kernel tag
...
This commit is contained in:
258
tools/arch/arm64/include/asm/cputype.h
Normal file
258
tools/arch/arm64/include/asm/cputype.h
Normal file
@@ -0,0 +1,258 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0-only */
|
||||
/*
|
||||
* Copyright (C) 2012 ARM Ltd.
|
||||
*/
|
||||
#ifndef __ASM_CPUTYPE_H
|
||||
#define __ASM_CPUTYPE_H
|
||||
|
||||
#define INVALID_HWID ULONG_MAX
|
||||
|
||||
#define MPIDR_UP_BITMASK (0x1 << 30)
|
||||
#define MPIDR_MT_BITMASK (0x1 << 24)
|
||||
#define MPIDR_HWID_BITMASK UL(0xff00ffffff)
|
||||
|
||||
#define MPIDR_LEVEL_BITS_SHIFT 3
|
||||
#define MPIDR_LEVEL_BITS (1 << MPIDR_LEVEL_BITS_SHIFT)
|
||||
#define MPIDR_LEVEL_MASK ((1 << MPIDR_LEVEL_BITS) - 1)
|
||||
|
||||
#define MPIDR_LEVEL_SHIFT(level) \
|
||||
(((1 << level) >> 1) << MPIDR_LEVEL_BITS_SHIFT)
|
||||
|
||||
#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
|
||||
((mpidr >> MPIDR_LEVEL_SHIFT(level)) & MPIDR_LEVEL_MASK)
|
||||
|
||||
#define MIDR_REVISION_MASK 0xf
|
||||
#define MIDR_REVISION(midr) ((midr) & MIDR_REVISION_MASK)
|
||||
#define MIDR_PARTNUM_SHIFT 4
|
||||
#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
|
||||
#define MIDR_PARTNUM(midr) \
|
||||
(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
|
||||
#define MIDR_ARCHITECTURE_SHIFT 16
|
||||
#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
|
||||
#define MIDR_ARCHITECTURE(midr) \
|
||||
(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
|
||||
#define MIDR_VARIANT_SHIFT 20
|
||||
#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
|
||||
#define MIDR_VARIANT(midr) \
|
||||
(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
|
||||
#define MIDR_IMPLEMENTOR_SHIFT 24
|
||||
#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
|
||||
#define MIDR_IMPLEMENTOR(midr) \
|
||||
(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
|
||||
|
||||
#define MIDR_CPU_MODEL(imp, partnum) \
|
||||
(((imp) << MIDR_IMPLEMENTOR_SHIFT) | \
|
||||
(0xf << MIDR_ARCHITECTURE_SHIFT) | \
|
||||
((partnum) << MIDR_PARTNUM_SHIFT))
|
||||
|
||||
#define MIDR_CPU_VAR_REV(var, rev) \
|
||||
(((var) << MIDR_VARIANT_SHIFT) | (rev))
|
||||
|
||||
#define MIDR_CPU_MODEL_MASK (MIDR_IMPLEMENTOR_MASK | MIDR_PARTNUM_MASK | \
|
||||
MIDR_ARCHITECTURE_MASK)
|
||||
|
||||
#define ARM_CPU_IMP_ARM 0x41
|
||||
#define ARM_CPU_IMP_APM 0x50
|
||||
#define ARM_CPU_IMP_CAVIUM 0x43
|
||||
#define ARM_CPU_IMP_BRCM 0x42
|
||||
#define ARM_CPU_IMP_QCOM 0x51
|
||||
#define ARM_CPU_IMP_NVIDIA 0x4E
|
||||
#define ARM_CPU_IMP_FUJITSU 0x46
|
||||
#define ARM_CPU_IMP_HISI 0x48
|
||||
#define ARM_CPU_IMP_APPLE 0x61
|
||||
|
||||
#define ARM_CPU_PART_AEM_V8 0xD0F
|
||||
#define ARM_CPU_PART_FOUNDATION 0xD00
|
||||
#define ARM_CPU_PART_CORTEX_A57 0xD07
|
||||
#define ARM_CPU_PART_CORTEX_A72 0xD08
|
||||
#define ARM_CPU_PART_CORTEX_A53 0xD03
|
||||
#define ARM_CPU_PART_CORTEX_A73 0xD09
|
||||
#define ARM_CPU_PART_CORTEX_A75 0xD0A
|
||||
#define ARM_CPU_PART_CORTEX_A35 0xD04
|
||||
#define ARM_CPU_PART_CORTEX_A55 0xD05
|
||||
#define ARM_CPU_PART_CORTEX_A76 0xD0B
|
||||
#define ARM_CPU_PART_NEOVERSE_N1 0xD0C
|
||||
#define ARM_CPU_PART_CORTEX_A77 0xD0D
|
||||
#define ARM_CPU_PART_NEOVERSE_V1 0xD40
|
||||
#define ARM_CPU_PART_CORTEX_A78 0xD41
|
||||
#define ARM_CPU_PART_CORTEX_X1 0xD44
|
||||
#define ARM_CPU_PART_CORTEX_A510 0xD46
|
||||
#define ARM_CPU_PART_CORTEX_A710 0xD47
|
||||
#define ARM_CPU_PART_CORTEX_X2 0xD48
|
||||
#define ARM_CPU_PART_NEOVERSE_N2 0xD49
|
||||
#define ARM_CPU_PART_CORTEX_A78C 0xD4B
|
||||
|
||||
#define APM_CPU_PART_POTENZA 0x000
|
||||
|
||||
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
|
||||
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
|
||||
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
|
||||
#define CAVIUM_CPU_PART_THUNDERX2 0x0AF
|
||||
/* OcteonTx2 series */
|
||||
#define CAVIUM_CPU_PART_OCTX2_98XX 0x0B1
|
||||
#define CAVIUM_CPU_PART_OCTX2_96XX 0x0B2
|
||||
#define CAVIUM_CPU_PART_OCTX2_95XX 0x0B3
|
||||
#define CAVIUM_CPU_PART_OCTX2_95XXN 0x0B4
|
||||
#define CAVIUM_CPU_PART_OCTX2_95XXMM 0x0B5
|
||||
#define CAVIUM_CPU_PART_OCTX2_95XXO 0x0B6
|
||||
|
||||
#define BRCM_CPU_PART_BRAHMA_B53 0x100
|
||||
#define BRCM_CPU_PART_VULCAN 0x516
|
||||
|
||||
#define QCOM_CPU_PART_FALKOR_V1 0x800
|
||||
#define QCOM_CPU_PART_FALKOR 0xC00
|
||||
#define QCOM_CPU_PART_KRYO 0x200
|
||||
#define QCOM_CPU_PART_KRYO_2XX_GOLD 0x800
|
||||
#define QCOM_CPU_PART_KRYO_2XX_SILVER 0x801
|
||||
#define QCOM_CPU_PART_KRYO_3XX_SILVER 0x803
|
||||
#define QCOM_CPU_PART_KRYO_4XX_GOLD 0x804
|
||||
#define QCOM_CPU_PART_KRYO_4XX_SILVER 0x805
|
||||
|
||||
#define NVIDIA_CPU_PART_DENVER 0x003
|
||||
#define NVIDIA_CPU_PART_CARMEL 0x004
|
||||
|
||||
#define FUJITSU_CPU_PART_A64FX 0x001
|
||||
|
||||
#define HISI_CPU_PART_TSV110 0xD01
|
||||
|
||||
#define APPLE_CPU_PART_M1_ICESTORM 0x022
|
||||
#define APPLE_CPU_PART_M1_FIRESTORM 0x023
|
||||
|
||||
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
|
||||
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
|
||||
#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
|
||||
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
|
||||
#define MIDR_CORTEX_A75 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A75)
|
||||
#define MIDR_CORTEX_A35 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A35)
|
||||
#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55)
|
||||
#define MIDR_CORTEX_A76 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
|
||||
#define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
|
||||
#define MIDR_CORTEX_A77 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
|
||||
#define MIDR_NEOVERSE_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_V1)
|
||||
#define MIDR_CORTEX_A78 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78)
|
||||
#define MIDR_CORTEX_X1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X1)
|
||||
#define MIDR_CORTEX_A510 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A510)
|
||||
#define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
|
||||
#define MIDR_CORTEX_X2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X2)
|
||||
#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
|
||||
#define MIDR_CORTEX_A78C MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78C)
|
||||
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
|
||||
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
|
||||
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
|
||||
#define MIDR_OCTX2_98XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_98XX)
|
||||
#define MIDR_OCTX2_96XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_96XX)
|
||||
#define MIDR_OCTX2_95XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XX)
|
||||
#define MIDR_OCTX2_95XXN MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXN)
|
||||
#define MIDR_OCTX2_95XXMM MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXMM)
|
||||
#define MIDR_OCTX2_95XXO MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXO)
|
||||
#define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2)
|
||||
#define MIDR_BRAHMA_B53 MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_BRAHMA_B53)
|
||||
#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN)
|
||||
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
|
||||
#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
|
||||
#define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO)
|
||||
#define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD)
|
||||
#define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER)
|
||||
#define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER)
|
||||
#define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD)
|
||||
#define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER)
|
||||
#define MIDR_NVIDIA_DENVER MIDR_CPU_MODEL(ARM_CPU_IMP_NVIDIA, NVIDIA_CPU_PART_DENVER)
|
||||
#define MIDR_NVIDIA_CARMEL MIDR_CPU_MODEL(ARM_CPU_IMP_NVIDIA, NVIDIA_CPU_PART_CARMEL)
|
||||
#define MIDR_FUJITSU_A64FX MIDR_CPU_MODEL(ARM_CPU_IMP_FUJITSU, FUJITSU_CPU_PART_A64FX)
|
||||
#define MIDR_HISI_TSV110 MIDR_CPU_MODEL(ARM_CPU_IMP_HISI, HISI_CPU_PART_TSV110)
|
||||
#define MIDR_APPLE_M1_ICESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM)
|
||||
#define MIDR_APPLE_M1_FIRESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM)
|
||||
|
||||
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
|
||||
#define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX
|
||||
#define MIDR_FUJITSU_ERRATUM_010001_MASK (~MIDR_CPU_VAR_REV(1, 0))
|
||||
#define TCR_CLEAR_FUJITSU_ERRATUM_010001 (TCR_NFD1 | TCR_NFD0)
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
#include "sysreg.h"
|
||||
|
||||
#define read_cpuid(reg) read_sysreg_s(SYS_ ## reg)
|
||||
|
||||
/*
|
||||
* Represent a range of MIDR values for a given CPU model and a
|
||||
* range of variant/revision values.
|
||||
*
|
||||
* @model - CPU model as defined by MIDR_CPU_MODEL
|
||||
* @rv_min - Minimum value for the revision/variant as defined by
|
||||
* MIDR_CPU_VAR_REV
|
||||
* @rv_max - Maximum value for the variant/revision for the range.
|
||||
*/
|
||||
struct midr_range {
|
||||
u32 model;
|
||||
u32 rv_min;
|
||||
u32 rv_max;
|
||||
};
|
||||
|
||||
#define MIDR_RANGE(m, v_min, r_min, v_max, r_max) \
|
||||
{ \
|
||||
.model = m, \
|
||||
.rv_min = MIDR_CPU_VAR_REV(v_min, r_min), \
|
||||
.rv_max = MIDR_CPU_VAR_REV(v_max, r_max), \
|
||||
}
|
||||
|
||||
#define MIDR_REV_RANGE(m, v, r_min, r_max) MIDR_RANGE(m, v, r_min, v, r_max)
|
||||
#define MIDR_REV(m, v, r) MIDR_RANGE(m, v, r, v, r)
|
||||
#define MIDR_ALL_VERSIONS(m) MIDR_RANGE(m, 0, 0, 0xf, 0xf)
|
||||
|
||||
static inline bool midr_is_cpu_model_range(u32 midr, u32 model, u32 rv_min,
|
||||
u32 rv_max)
|
||||
{
|
||||
u32 _model = midr & MIDR_CPU_MODEL_MASK;
|
||||
u32 rv = midr & (MIDR_REVISION_MASK | MIDR_VARIANT_MASK);
|
||||
|
||||
return _model == model && rv >= rv_min && rv <= rv_max;
|
||||
}
|
||||
|
||||
static inline bool is_midr_in_range(u32 midr, struct midr_range const *range)
|
||||
{
|
||||
return midr_is_cpu_model_range(midr, range->model,
|
||||
range->rv_min, range->rv_max);
|
||||
}
|
||||
|
||||
static inline bool
|
||||
is_midr_in_range_list(u32 midr, struct midr_range const *ranges)
|
||||
{
|
||||
while (ranges->model)
|
||||
if (is_midr_in_range(midr, ranges++))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* The CPU ID never changes at run time, so we might as well tell the
|
||||
* compiler that it's constant. Use this function to read the CPU ID
|
||||
* rather than directly reading processor_id or read_cpuid() directly.
|
||||
*/
|
||||
static inline u32 __attribute_const__ read_cpuid_id(void)
|
||||
{
|
||||
return read_cpuid(MIDR_EL1);
|
||||
}
|
||||
|
||||
static inline u64 __attribute_const__ read_cpuid_mpidr(void)
|
||||
{
|
||||
return read_cpuid(MPIDR_EL1);
|
||||
}
|
||||
|
||||
static inline unsigned int __attribute_const__ read_cpuid_implementor(void)
|
||||
{
|
||||
return MIDR_IMPLEMENTOR(read_cpuid_id());
|
||||
}
|
||||
|
||||
static inline unsigned int __attribute_const__ read_cpuid_part_number(void)
|
||||
{
|
||||
return MIDR_PARTNUM(read_cpuid_id());
|
||||
}
|
||||
|
||||
static inline u32 __attribute_const__ read_cpuid_cachetype(void)
|
||||
{
|
||||
return read_cpuid(CTR_EL0);
|
||||
}
|
||||
#endif /* __ASSEMBLY__ */
|
||||
|
||||
#endif
|
||||
@@ -299,9 +299,6 @@
|
||||
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
|
||||
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
|
||||
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
|
||||
#define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */
|
||||
#define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
|
||||
#define X86_FEATURE_AMX_INT8 (18*32+25) /* AMX int8 Support */
|
||||
|
||||
/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
|
||||
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
|
||||
@@ -330,6 +327,7 @@
|
||||
#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */
|
||||
#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */
|
||||
#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */
|
||||
#define X86_FEATURE_HFI (14*32+19) /* Hardware Feedback Interface */
|
||||
|
||||
/* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */
|
||||
#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */
|
||||
@@ -390,7 +388,10 @@
|
||||
#define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
|
||||
#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
|
||||
#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
|
||||
#define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */
|
||||
#define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
|
||||
#define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
|
||||
#define X86_FEATURE_AMX_INT8 (18*32+25) /* AMX int8 Support */
|
||||
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
|
||||
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
|
||||
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
|
||||
|
||||
@@ -56,8 +56,11 @@
|
||||
# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31))
|
||||
#endif
|
||||
|
||||
/* Force disable because it's broken beyond repair */
|
||||
#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
|
||||
#ifdef CONFIG_INTEL_IOMMU_SVM
|
||||
# define DISABLE_ENQCMD 0
|
||||
#else
|
||||
# define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_X86_SGX
|
||||
# define DISABLE_SGX 0
|
||||
|
||||
@@ -705,12 +705,14 @@
|
||||
|
||||
#define PACKAGE_THERM_STATUS_PROCHOT (1 << 0)
|
||||
#define PACKAGE_THERM_STATUS_POWER_LIMIT (1 << 10)
|
||||
#define PACKAGE_THERM_STATUS_HFI_UPDATED (1 << 26)
|
||||
|
||||
#define MSR_IA32_PACKAGE_THERM_INTERRUPT 0x000001b2
|
||||
|
||||
#define PACKAGE_THERM_INT_HIGH_ENABLE (1 << 0)
|
||||
#define PACKAGE_THERM_INT_LOW_ENABLE (1 << 1)
|
||||
#define PACKAGE_THERM_INT_PLN_ENABLE (1 << 24)
|
||||
#define PACKAGE_THERM_INT_HFI_ENABLE (1 << 25)
|
||||
|
||||
/* Thermal Thresholds Support */
|
||||
#define THERM_INT_THRESHOLD0_ENABLE (1 << 15)
|
||||
@@ -959,4 +961,8 @@
|
||||
#define MSR_VM_IGNNE 0xc0010115
|
||||
#define MSR_VM_HSAVE_PA 0xc0010117
|
||||
|
||||
/* Hardware Feedback Interface */
|
||||
#define MSR_IA32_HW_FEEDBACK_PTR 0x17d0
|
||||
#define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1
|
||||
|
||||
#endif /* _ASM_X86_MSR_INDEX_H */
|
||||
|
||||
@@ -102,10 +102,6 @@
|
||||
# define __init
|
||||
#endif
|
||||
|
||||
#ifndef noinline
|
||||
# define noinline
|
||||
#endif
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
/*
|
||||
|
||||
@@ -18,6 +18,7 @@
|
||||
* ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
|
||||
* directly use below macros as config bits.
|
||||
*/
|
||||
#define ETM_OPT_BRANCH_BROADCAST 8
|
||||
#define ETM_OPT_CYCACC 12
|
||||
#define ETM_OPT_CTXTID 14
|
||||
#define ETM_OPT_CTXTID2 15
|
||||
@@ -25,6 +26,7 @@
|
||||
#define ETM_OPT_RETSTK 29
|
||||
|
||||
/* ETMv4 CONFIGR programming bits for the ETM OPTs */
|
||||
#define ETM4_CFG_BIT_BB 3
|
||||
#define ETM4_CFG_BIT_CYCACC 4
|
||||
#define ETM4_CFG_BIT_CTXTID 6
|
||||
#define ETM4_CFG_BIT_VMID 7
|
||||
|
||||
@@ -88,6 +88,23 @@ int fdarray__add(struct fdarray *fda, int fd, short revents, enum fdarray_flags
|
||||
return pos;
|
||||
}
|
||||
|
||||
int fdarray__dup_entry_from(struct fdarray *fda, int pos, struct fdarray *from)
|
||||
{
|
||||
struct pollfd *entry;
|
||||
int npos;
|
||||
|
||||
if (pos >= from->nr)
|
||||
return -EINVAL;
|
||||
|
||||
entry = &from->entries[pos];
|
||||
|
||||
npos = fdarray__add(fda, entry->fd, entry->events, from->priv[pos].flags);
|
||||
if (npos >= 0)
|
||||
fda->priv[npos] = from->priv[pos];
|
||||
|
||||
return npos;
|
||||
}
|
||||
|
||||
int fdarray__filter(struct fdarray *fda, short revents,
|
||||
void (*entry_destructor)(struct fdarray *fda, int fd, void *arg),
|
||||
void *arg)
|
||||
|
||||
@@ -42,6 +42,7 @@ struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow);
|
||||
void fdarray__delete(struct fdarray *fda);
|
||||
|
||||
int fdarray__add(struct fdarray *fda, int fd, short revents, enum fdarray_flags flags);
|
||||
int fdarray__dup_entry_from(struct fdarray *fda, int pos, struct fdarray *from);
|
||||
int fdarray__poll(struct fdarray *fda, int timeout);
|
||||
int fdarray__filter(struct fdarray *fda, short revents,
|
||||
void (*entry_destructor)(struct fdarray *fda, int fd, void *arg),
|
||||
|
||||
@@ -62,11 +62,12 @@ SYNOPSIS
|
||||
struct perf_thread_map;
|
||||
|
||||
struct perf_thread_map *perf_thread_map__new_dummy(void);
|
||||
struct perf_thread_map *perf_thread_map__new_array(int nr_threads, pid_t *array);
|
||||
|
||||
void perf_thread_map__set_pid(struct perf_thread_map *map, int thread, pid_t pid);
|
||||
char *perf_thread_map__comm(struct perf_thread_map *map, int thread);
|
||||
void perf_thread_map__set_pid(struct perf_thread_map *map, int idx, pid_t pid);
|
||||
char *perf_thread_map__comm(struct perf_thread_map *map, int idx);
|
||||
int perf_thread_map__nr(struct perf_thread_map *threads);
|
||||
pid_t perf_thread_map__pid(struct perf_thread_map *map, int thread);
|
||||
pid_t perf_thread_map__pid(struct perf_thread_map *map, int idx);
|
||||
|
||||
struct perf_thread_map *perf_thread_map__get(struct perf_thread_map *map);
|
||||
void perf_thread_map__put(struct perf_thread_map *map);
|
||||
|
||||
@@ -8,11 +8,12 @@
|
||||
struct perf_thread_map;
|
||||
|
||||
LIBPERF_API struct perf_thread_map *perf_thread_map__new_dummy(void);
|
||||
LIBPERF_API struct perf_thread_map *perf_thread_map__new_array(int nr_threads, pid_t *array);
|
||||
|
||||
LIBPERF_API void perf_thread_map__set_pid(struct perf_thread_map *map, int thread, pid_t pid);
|
||||
LIBPERF_API char *perf_thread_map__comm(struct perf_thread_map *map, int thread);
|
||||
LIBPERF_API void perf_thread_map__set_pid(struct perf_thread_map *map, int idx, pid_t pid);
|
||||
LIBPERF_API char *perf_thread_map__comm(struct perf_thread_map *map, int idx);
|
||||
LIBPERF_API int perf_thread_map__nr(struct perf_thread_map *threads);
|
||||
LIBPERF_API pid_t perf_thread_map__pid(struct perf_thread_map *map, int thread);
|
||||
LIBPERF_API pid_t perf_thread_map__pid(struct perf_thread_map *map, int idx);
|
||||
|
||||
LIBPERF_API struct perf_thread_map *perf_thread_map__get(struct perf_thread_map *map);
|
||||
LIBPERF_API void perf_thread_map__put(struct perf_thread_map *map);
|
||||
|
||||
@@ -12,6 +12,7 @@ LIBPERF_0.0.1 {
|
||||
perf_cpu_map__empty;
|
||||
perf_cpu_map__max;
|
||||
perf_cpu_map__has;
|
||||
perf_thread_map__new_array;
|
||||
perf_thread_map__new_dummy;
|
||||
perf_thread_map__set_pid;
|
||||
perf_thread_map__comm;
|
||||
|
||||
@@ -69,7 +69,7 @@ static int test_stat_cpu(void)
|
||||
perf_evlist__set_maps(evlist, cpus, NULL);
|
||||
|
||||
err = perf_evlist__open(evlist);
|
||||
__T("failed to open evsel", err == 0);
|
||||
__T("failed to open evlist", err == 0);
|
||||
|
||||
perf_evlist__for_each_evsel(evlist, evsel) {
|
||||
cpus = perf_evsel__cpus(evsel);
|
||||
@@ -130,7 +130,7 @@ static int test_stat_thread(void)
|
||||
perf_evlist__set_maps(evlist, NULL, threads);
|
||||
|
||||
err = perf_evlist__open(evlist);
|
||||
__T("failed to open evsel", err == 0);
|
||||
__T("failed to open evlist", err == 0);
|
||||
|
||||
perf_evlist__for_each_evsel(evlist, evsel) {
|
||||
perf_evsel__read(evsel, 0, 0, &counts);
|
||||
@@ -187,7 +187,7 @@ static int test_stat_thread_enable(void)
|
||||
perf_evlist__set_maps(evlist, NULL, threads);
|
||||
|
||||
err = perf_evlist__open(evlist);
|
||||
__T("failed to open evsel", err == 0);
|
||||
__T("failed to open evlist", err == 0);
|
||||
|
||||
perf_evlist__for_each_evsel(evlist, evsel) {
|
||||
perf_evsel__read(evsel, 0, 0, &counts);
|
||||
@@ -507,7 +507,7 @@ static int test_stat_multiplexing(void)
|
||||
perf_evlist__set_maps(evlist, NULL, threads);
|
||||
|
||||
err = perf_evlist__open(evlist);
|
||||
__T("failed to open evsel", err == 0);
|
||||
__T("failed to open evlist", err == 0);
|
||||
|
||||
perf_evlist__enable(evlist);
|
||||
|
||||
|
||||
@@ -11,9 +11,43 @@ static int libperf_print(enum libperf_print_level level,
|
||||
return vfprintf(stderr, fmt, ap);
|
||||
}
|
||||
|
||||
static int test_threadmap_array(int nr, pid_t *array)
|
||||
{
|
||||
struct perf_thread_map *threads;
|
||||
int i;
|
||||
|
||||
threads = perf_thread_map__new_array(nr, array);
|
||||
__T("Failed to allocate new thread map", threads);
|
||||
|
||||
__T("Unexpected number of threads", perf_thread_map__nr(threads) == nr);
|
||||
|
||||
for (i = 0; i < nr; i++) {
|
||||
__T("Unexpected initial value of thread",
|
||||
perf_thread_map__pid(threads, i) == (array ? array[i] : -1));
|
||||
}
|
||||
|
||||
for (i = 1; i < nr; i++)
|
||||
perf_thread_map__set_pid(threads, i, i * 100);
|
||||
|
||||
__T("Unexpected value of thread 0",
|
||||
perf_thread_map__pid(threads, 0) == (array ? array[0] : -1));
|
||||
|
||||
for (i = 1; i < nr; i++) {
|
||||
__T("Unexpected thread value",
|
||||
perf_thread_map__pid(threads, i) == i * 100);
|
||||
}
|
||||
|
||||
perf_thread_map__put(threads);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define THREADS_NR 10
|
||||
int test_threadmap(int argc, char **argv)
|
||||
{
|
||||
struct perf_thread_map *threads;
|
||||
pid_t thr_array[THREADS_NR];
|
||||
int i;
|
||||
|
||||
__T_START;
|
||||
|
||||
@@ -27,6 +61,13 @@ int test_threadmap(int argc, char **argv)
|
||||
perf_thread_map__put(threads);
|
||||
perf_thread_map__put(threads);
|
||||
|
||||
test_threadmap_array(THREADS_NR, NULL);
|
||||
|
||||
for (i = 0; i < THREADS_NR; i++)
|
||||
thr_array[i] = i + 100;
|
||||
|
||||
test_threadmap_array(THREADS_NR, thr_array);
|
||||
|
||||
__T_END;
|
||||
return tests_failed == 0 ? 0 : -1;
|
||||
}
|
||||
|
||||
@@ -32,26 +32,36 @@ struct perf_thread_map *perf_thread_map__realloc(struct perf_thread_map *map, in
|
||||
|
||||
#define thread_map__alloc(__nr) perf_thread_map__realloc(NULL, __nr)
|
||||
|
||||
void perf_thread_map__set_pid(struct perf_thread_map *map, int thread, pid_t pid)
|
||||
void perf_thread_map__set_pid(struct perf_thread_map *map, int idx, pid_t pid)
|
||||
{
|
||||
map->map[thread].pid = pid;
|
||||
map->map[idx].pid = pid;
|
||||
}
|
||||
|
||||
char *perf_thread_map__comm(struct perf_thread_map *map, int thread)
|
||||
char *perf_thread_map__comm(struct perf_thread_map *map, int idx)
|
||||
{
|
||||
return map->map[thread].comm;
|
||||
return map->map[idx].comm;
|
||||
}
|
||||
|
||||
struct perf_thread_map *perf_thread_map__new_array(int nr_threads, pid_t *array)
|
||||
{
|
||||
struct perf_thread_map *threads = thread_map__alloc(nr_threads);
|
||||
int i;
|
||||
|
||||
if (!threads)
|
||||
return NULL;
|
||||
|
||||
for (i = 0; i < nr_threads; i++)
|
||||
perf_thread_map__set_pid(threads, i, array ? array[i] : -1);
|
||||
|
||||
threads->nr = nr_threads;
|
||||
refcount_set(&threads->refcnt, 1);
|
||||
|
||||
return threads;
|
||||
}
|
||||
|
||||
struct perf_thread_map *perf_thread_map__new_dummy(void)
|
||||
{
|
||||
struct perf_thread_map *threads = thread_map__alloc(1);
|
||||
|
||||
if (threads != NULL) {
|
||||
perf_thread_map__set_pid(threads, 0, -1);
|
||||
threads->nr = 1;
|
||||
refcount_set(&threads->refcnt, 1);
|
||||
}
|
||||
return threads;
|
||||
return perf_thread_map__new_array(1, NULL);
|
||||
}
|
||||
|
||||
static void perf_thread_map__delete(struct perf_thread_map *threads)
|
||||
@@ -85,7 +95,7 @@ int perf_thread_map__nr(struct perf_thread_map *threads)
|
||||
return threads ? threads->nr : 1;
|
||||
}
|
||||
|
||||
pid_t perf_thread_map__pid(struct perf_thread_map *map, int thread)
|
||||
pid_t perf_thread_map__pid(struct perf_thread_map *map, int idx)
|
||||
{
|
||||
return map->map[thread].pid;
|
||||
return map->map[idx].pid;
|
||||
}
|
||||
|
||||
@@ -7,6 +7,8 @@
|
||||
p synthesize power events (incl. PSB events for Intel PT)
|
||||
o synthesize other events recorded due to the use
|
||||
of aux-output (refer to perf record)
|
||||
I synthesize interrupt or similar (asynchronous) events
|
||||
(e.g. Intel PT Event Trace)
|
||||
e synthesize error events
|
||||
d create a debug log
|
||||
f synthesize first level cache events
|
||||
|
||||
@@ -9,32 +9,24 @@ perf-ftrace - simple wrapper for kernel's ftrace functionality
|
||||
SYNOPSIS
|
||||
--------
|
||||
[verse]
|
||||
'perf ftrace' <command>
|
||||
'perf ftrace' {trace|latency} <command>
|
||||
|
||||
DESCRIPTION
|
||||
-----------
|
||||
The 'perf ftrace' command is a simple wrapper of kernel's ftrace
|
||||
functionality. It only supports single thread tracing currently and
|
||||
just reads trace_pipe in text and then write it to stdout.
|
||||
The 'perf ftrace' command provides a collection of subcommands which use
|
||||
kernel's ftrace infrastructure.
|
||||
|
||||
'perf ftrace trace' is a simple wrapper of the ftrace. It only supports
|
||||
single thread tracing currently and just reads trace_pipe in text and then
|
||||
write it to stdout.
|
||||
|
||||
'perf ftrace latency' calculates execution latency of a given function
|
||||
(optionally with BPF) and display it as a histogram.
|
||||
|
||||
The following options apply to perf ftrace.
|
||||
|
||||
OPTIONS
|
||||
-------
|
||||
|
||||
-t::
|
||||
--tracer=::
|
||||
Tracer to use when neither -G nor -F option is not
|
||||
specified: function_graph or function.
|
||||
|
||||
-v::
|
||||
--verbose::
|
||||
Increase the verbosity level.
|
||||
|
||||
-F::
|
||||
--funcs::
|
||||
List available functions to trace. It accepts a pattern to
|
||||
only list interested functions.
|
||||
COMMON OPTIONS
|
||||
--------------
|
||||
|
||||
-p::
|
||||
--pid=::
|
||||
@@ -43,10 +35,6 @@ OPTIONS
|
||||
--tid=::
|
||||
Trace on existing thread id (comma separated list).
|
||||
|
||||
-D::
|
||||
--delay::
|
||||
Time (ms) to wait before starting tracing after program start.
|
||||
|
||||
-a::
|
||||
--all-cpus::
|
||||
Force system-wide collection. Scripts run without a <command>
|
||||
@@ -61,6 +49,28 @@ OPTIONS
|
||||
Ranges of CPUs are specified with -: 0-2.
|
||||
Default is to trace on all online CPUs.
|
||||
|
||||
-v::
|
||||
--verbose::
|
||||
Increase the verbosity level.
|
||||
|
||||
|
||||
OPTIONS for 'perf ftrace trace'
|
||||
-------------------------------
|
||||
|
||||
-t::
|
||||
--tracer=::
|
||||
Tracer to use when neither -G nor -F option is not
|
||||
specified: function_graph or function.
|
||||
|
||||
-F::
|
||||
--funcs::
|
||||
List available functions to trace. It accepts a pattern to
|
||||
only list interested functions.
|
||||
|
||||
-D::
|
||||
--delay::
|
||||
Time (ms) to wait before starting tracing after program start.
|
||||
|
||||
-m::
|
||||
--buffer-size::
|
||||
Set the size of per-cpu tracing buffer, <size> is expected to
|
||||
@@ -114,6 +124,25 @@ OPTIONS
|
||||
thresh=<n> - Setup trace duration threshold in microseconds.
|
||||
depth=<n> - Set max depth for function graph tracer to follow.
|
||||
|
||||
|
||||
OPTIONS for 'perf ftrace latency'
|
||||
---------------------------------
|
||||
|
||||
-T::
|
||||
--trace-funcs=::
|
||||
Set the function name to get the histogram. Unlike perf ftrace trace,
|
||||
it only allows single function to calculate the histogram.
|
||||
|
||||
-b::
|
||||
--use-bpf::
|
||||
Use BPF to measure function latency instead of using the ftrace (it
|
||||
uses function_graph tracer internally).
|
||||
|
||||
-n::
|
||||
--use-nsec::
|
||||
Use nano-second instead of micro-second as a base unit of the histogram.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
linkperf:perf-record[1], linkperf:perf-trace[1]
|
||||
|
||||
@@ -108,9 +108,10 @@ displayed as follows:
|
||||
|
||||
perf script --itrace=ibxwpe -F+flags
|
||||
|
||||
The flags are "bcrosyiABExgh" which stand for branch, call, return, conditional,
|
||||
The flags are "bcrosyiABExghDt" which stand for branch, call, return, conditional,
|
||||
system, asynchronous, interrupt, transaction abort, trace begin, trace end,
|
||||
in transaction, VM-entry, and VM-exit respectively.
|
||||
in transaction, VM-entry, VM-exit, interrupt disabled, and interrupt disable
|
||||
toggle respectively.
|
||||
|
||||
perf script also supports higher level ways to dump instruction traces:
|
||||
|
||||
@@ -483,6 +484,30 @@ pwr_evt Enable power events. The power events provide information about
|
||||
which contains "1" if the feature is supported and
|
||||
"0" otherwise.
|
||||
|
||||
event Enable Event Trace. The events provide information about asynchronous
|
||||
events.
|
||||
|
||||
Support for this feature is indicated by:
|
||||
|
||||
/sys/bus/event_source/devices/intel_pt/caps/event_trace
|
||||
|
||||
which contains "1" if the feature is supported and
|
||||
"0" otherwise.
|
||||
|
||||
notnt Disable TNT packets. Without TNT packets, it is not possible to walk
|
||||
executable code to reconstruct control flow, however FUP, TIP, TIP.PGE
|
||||
and TIP.PGD packets still indicate asynchronous control flow, and (if
|
||||
return compression is disabled - see noretcomp) return statements.
|
||||
The advantage of eliminating TNT packets is reducing the size of the
|
||||
trace and corresponding tracing overhead.
|
||||
|
||||
Support for this feature is indicated by:
|
||||
|
||||
/sys/bus/event_source/devices/intel_pt/caps/tnt_disable
|
||||
|
||||
which contains "1" if the feature is supported and
|
||||
"0" otherwise.
|
||||
|
||||
|
||||
AUX area sampling option
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@@ -876,6 +901,8 @@ The letters are:
|
||||
p synthesize "power" events (incl. PSB events)
|
||||
c synthesize branches events (calls only)
|
||||
r synthesize branches events (returns only)
|
||||
o synthesize PEBS-via-PT events
|
||||
I synthesize Event Trace events
|
||||
e synthesize tracing error events
|
||||
d create a debug log
|
||||
g synthesize a call chain (use with i or x)
|
||||
@@ -1371,6 +1398,79 @@ There were none.
|
||||
:17006 17006 [001] 11500.262869216: ffffffff8220116e error_entry+0xe ([guest.kernel.kallsyms]) pushq %rax
|
||||
|
||||
|
||||
Event Trace
|
||||
-----------
|
||||
|
||||
Event Trace records information about asynchronous events, for example interrupts,
|
||||
faults, VM exits and entries. The information is recorded in CFE and EVD packets,
|
||||
and also the Interrupt Flag is recorded on the MODE.Exec packet. The CFE packet
|
||||
contains a type field to identify one of the following:
|
||||
|
||||
1 INTR interrupt, fault, exception, NMI
|
||||
2 IRET interrupt return
|
||||
3 SMI system management interrupt
|
||||
4 RSM resume from system management mode
|
||||
5 SIPI startup interprocessor interrupt
|
||||
6 INIT INIT signal
|
||||
7 VMENTRY VM-Entry
|
||||
8 VMEXIT VM-Entry
|
||||
9 VMEXIT_INTR VM-Exit due to interrupt
|
||||
10 SHUTDOWN Shutdown
|
||||
|
||||
For more details, refer to the Intel 64 and IA-32 Architectures Software
|
||||
Developer Manuals (version 076 or later).
|
||||
|
||||
The capability to do Event Trace is indicated by the
|
||||
/sys/bus/event_source/devices/intel_pt/caps/event_trace file.
|
||||
|
||||
Event trace is selected for recording using the "event" config term. e.g.
|
||||
|
||||
perf record -e intel_pt/event/u uname
|
||||
|
||||
Event trace events are output using the --itrace I option. e.g.
|
||||
|
||||
perf script --itrace=Ie
|
||||
|
||||
perf script displays events containing CFE type, vector and event data,
|
||||
in the form:
|
||||
|
||||
evt: hw int (t) cfe: INTR IP: 1 vector: 3 PFA: 0x8877665544332211
|
||||
|
||||
The IP flag indicates if the event binds to an IP, which includes any case where
|
||||
flow control packet generation is enabled, as well as when CFE packet IP bit is
|
||||
set.
|
||||
|
||||
perf script displays events containing changes to the Interrupt Flag in the form:
|
||||
|
||||
iflag: t IFLAG: 1->0 via branch
|
||||
|
||||
where "via branch" indicates a branch (interrupt or return from interrupt) and
|
||||
"non branch" indicates an instruction such as CFI, STI or POPF).
|
||||
|
||||
In addition, the current state of the interrupt flag is indicated by the presence
|
||||
or absence of the "D" (interrupt disabled) perf script flag. If the interrupt
|
||||
flag is changed, then the "t" flag is also included i.e.
|
||||
|
||||
no flag, interrupts enabled IF=1
|
||||
t interrupts become disabled IF=1 -> IF=0
|
||||
D interrupts are disabled IF=0
|
||||
Dt interrupts become enabled IF=0 -> IF=1
|
||||
|
||||
The intel-pt-events.py script illustrates how to access Event Trace information
|
||||
using a Python script.
|
||||
|
||||
|
||||
TNT Disable
|
||||
-----------
|
||||
|
||||
TNT packets are disabled using the "notnt" config term. e.g.
|
||||
|
||||
perf record -e intel_pt/notnt/u uname
|
||||
|
||||
In that case the --itrace q option is forced because walking executable code
|
||||
to reconstruct the control flow is not possible.
|
||||
|
||||
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
|
||||
@@ -54,6 +54,16 @@ REPORT OPTIONS
|
||||
Sorting key. Possible values: acquired (default), contended,
|
||||
avg_wait, wait_total, wait_max, wait_min.
|
||||
|
||||
-F::
|
||||
--field=<value>::
|
||||
Output fields. By default it shows all the fields but users can
|
||||
customize that using this. Possible values: acquired, contended,
|
||||
avg_wait, wait_total, wait_max, wait_min.
|
||||
|
||||
-c::
|
||||
--combine-locks::
|
||||
Merge lock instances in the same class (based on name).
|
||||
|
||||
INFO OPTIONS
|
||||
------------
|
||||
|
||||
|
||||
@@ -713,6 +713,40 @@ measurements:
|
||||
wait -n ${perf_pid}
|
||||
exit $?
|
||||
|
||||
--threads=<spec>::
|
||||
Write collected trace data into several data files using parallel threads.
|
||||
<spec> value can be user defined list of masks. Masks separated by colon
|
||||
define CPUs to be monitored by a thread and affinity mask of that thread
|
||||
is separated by slash:
|
||||
|
||||
<cpus mask 1>/<affinity mask 1>:<cpus mask 2>/<affinity mask 2>:...
|
||||
|
||||
CPUs or affinity masks must not overlap with other corresponding masks.
|
||||
Invalid CPUs are ignored, but masks containing only invalid CPUs are not
|
||||
allowed.
|
||||
|
||||
For example user specification like the following:
|
||||
|
||||
0,2-4/2-4:1,5-7/5-7
|
||||
|
||||
specifies parallel threads layout that consists of two threads,
|
||||
the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4,
|
||||
the second monitors CPUs 1 and 5-7 with the affinity mask 5-7.
|
||||
|
||||
<spec> value can also be a string meaning predefined parallel threads
|
||||
layout:
|
||||
|
||||
cpu - create new data streaming thread for every monitored cpu
|
||||
core - create new thread to monitor CPUs grouped by a core
|
||||
package - create new thread to monitor CPUs grouped by a package
|
||||
numa - create new threed to monitor CPUs grouped by a NUMA domain
|
||||
|
||||
Predefined layouts can be used on systems with large number of CPUs in
|
||||
order not to spawn multiple per-cpu streaming threads but still avoid LOST
|
||||
events in data directory files. Option specified with no or empty value
|
||||
defaults to CPU layout. Masks defined or provided by the option value are
|
||||
filtered through the mask provided by -C option.
|
||||
|
||||
include::intel-hybrid.txt[]
|
||||
|
||||
--debuginfod[=URLs]::
|
||||
|
||||
@@ -129,8 +129,8 @@ OPTIONS
|
||||
Comma separated list of fields to print. Options are:
|
||||
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
|
||||
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
|
||||
brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
|
||||
metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat.
|
||||
brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth,
|
||||
phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat.
|
||||
Field list can be prepended with the type, trace, sw or hw,
|
||||
to indicate to which event type the field list applies.
|
||||
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
|
||||
@@ -195,16 +195,19 @@ OPTIONS
|
||||
At this point usage is displayed, and perf-script exits.
|
||||
|
||||
The flags field is synthesized and may have a value when Instruction
|
||||
Trace decoding. The flags are "bcrosyiABExgh" which stand for branch,
|
||||
Trace decoding. The flags are "bcrosyiABExghDt" which stand for branch,
|
||||
call, return, conditional, system, asynchronous, interrupt,
|
||||
transaction abort, trace begin, trace end, in transaction, VM-Entry, and VM-Exit
|
||||
respectively. Known combinations of flags are printed more nicely e.g.
|
||||
transaction abort, trace begin, trace end, in transaction, VM-Entry,
|
||||
VM-Exit, interrupt disabled and interrupt disable toggle respectively.
|
||||
Known combinations of flags are printed more nicely e.g.
|
||||
"call" for "bc", "return" for "br", "jcc" for "bo", "jmp" for "b",
|
||||
"int" for "bci", "iret" for "bri", "syscall" for "bcs", "sysret" for "brs",
|
||||
"async" for "by", "hw int" for "bcyi", "tx abrt" for "bA", "tr strt" for "bB",
|
||||
"tr end" for "bE", "vmentry" for "bcg", "vmexit" for "bch".
|
||||
However the "x" flag will be displayed separately in those
|
||||
cases e.g. "jcc (x)" for a condition branch within a transaction.
|
||||
However the "x", "D" and "t" flags will be displayed separately in those
|
||||
cases e.g. "jcc (xD)" for a condition branch within a transaction
|
||||
with interrupts disabled. Note, interrupts becoming disabled is "t",
|
||||
whereas interrupts becoming enabled is "Dt".
|
||||
|
||||
The callindent field is synthesized and may have a value when
|
||||
Instruction Trace decoding. For calls and returns, it will display the
|
||||
@@ -238,6 +241,10 @@ OPTIONS
|
||||
is printed. This is the full execution path leading to the sample. This is only supported when the
|
||||
sample was recorded with perf record -b or -j any.
|
||||
|
||||
Use brstackinsnlen to print the brstackinsn lenght. For example, you
|
||||
can’t know the next sequential instruction after an unconditional branch unless
|
||||
you calculate that based on its length.
|
||||
|
||||
The brstackoff field will print an offset into a specific dso/binary.
|
||||
|
||||
With the metric option perf script can compute metrics for
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user