mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
This commit is contained in:
@@ -669,7 +669,7 @@ llvm.*::
|
||||
"$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \
|
||||
"-Wno-unused-value -Wno-pointer-sign " \
|
||||
"-working-directory $WORKING_DIR " \
|
||||
"-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
|
||||
"-c \"$CLANG_SOURCE\" --target=bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
|
||||
|
||||
llvm.clang-opt::
|
||||
Options passed to clang.
|
||||
|
||||
@@ -36,6 +36,9 @@ COMMON OPTIONS
|
||||
--input=<file>::
|
||||
Input file name. (default: perf.data unless stdin is a fifo)
|
||||
|
||||
--output=<file>::
|
||||
Output file name for perf lock contention and report.
|
||||
|
||||
-v::
|
||||
--verbose::
|
||||
Be more verbose (show symbol address, etc).
|
||||
@@ -200,6 +203,11 @@ CONTENTION OPTIONS
|
||||
Note that it matches the substring so 'rq' would match both 'raw_spin_rq_lock'
|
||||
and 'irq_enter_rcu'.
|
||||
|
||||
-x::
|
||||
--field-separator=<SEP>::
|
||||
Show results using a CSV-style output to make it easy to import directly
|
||||
into spreadsheets. Columns are separated by the string specified in SEP.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
|
||||
@@ -315,6 +315,9 @@ FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS)
|
||||
|
||||
FEATURE_CHECK_LDFLAGS-libaio = -lrt
|
||||
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-four-args = -lbfd -lopcodes -ldl
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-init-styled = -lbfd -lopcodes -ldl
|
||||
|
||||
CORE_CFLAGS += -fno-omit-frame-pointer
|
||||
CORE_CFLAGS += -ggdb3
|
||||
CORE_CFLAGS += -funwind-tables
|
||||
@@ -344,8 +347,8 @@ ifneq ($(TCMALLOC),)
|
||||
endif
|
||||
|
||||
ifeq ($(FEATURES_DUMP),)
|
||||
# We will display at the end of this Makefile.config, using $(call feature_display_entries),
|
||||
# as we may retry some feature detection here.
|
||||
# We will display at the end of this Makefile.config, using $(call feature_display_entries)
|
||||
# As we may retry some feature detection here, see the disassembler-four-args case, for instance
|
||||
FEATURE_DISPLAY_DEFERRED := 1
|
||||
include $(srctree)/tools/build/Makefile.feature
|
||||
else
|
||||
@@ -680,6 +683,10 @@ ifdef BUILD_BPF_SKEL
|
||||
CFLAGS += -DHAVE_BPF_SKEL
|
||||
endif
|
||||
|
||||
ifndef GEN_VMLINUX_H
|
||||
VMLINUX_H=$(src-perf)/util/bpf_skel/vmlinux/vmlinux.h
|
||||
endif
|
||||
|
||||
dwarf-post-unwind := 1
|
||||
dwarf-post-unwind-text := BUG
|
||||
|
||||
@@ -903,9 +910,13 @@ ifdef BUILD_NONDISTRO
|
||||
|
||||
ifeq ($(feature-libbfd-liberty), 1)
|
||||
EXTLIBS += -lbfd -lopcodes -liberty
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-four-args += -liberty -ldl
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-init-styled += -liberty -ldl
|
||||
else
|
||||
ifeq ($(feature-libbfd-liberty-z), 1)
|
||||
EXTLIBS += -lbfd -lopcodes -liberty -lz
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-four-args += -liberty -lz -ldl
|
||||
FEATURE_CHECK_LDFLAGS-disassembler-init-styled += -liberty -lz -ldl
|
||||
endif
|
||||
endif
|
||||
$(call feature_check,disassembler-four-args)
|
||||
@@ -1329,6 +1340,6 @@ endif
|
||||
|
||||
# re-generate FEATURE-DUMP as we may have called feature_check, found out
|
||||
# extra libraries to add to LDFLAGS of some other test and then redo those
|
||||
# tests.
|
||||
# tests, see the block about libbfd, disassembler-four-args, for instance.
|
||||
$(shell rm -f $(FEATURE_DUMP_FILENAME))
|
||||
$(foreach feat,$(FEATURE_TESTS),$(shell echo "$(call feature_assign,$(feat))" >> $(FEATURE_DUMP_FILENAME)))
|
||||
|
||||
@@ -132,6 +132,8 @@ include ../scripts/utilities.mak
|
||||
# Define EXTRA_TESTS to enable building extra tests useful mainly to perf
|
||||
# developers, such as:
|
||||
# x86 instruction decoder - new instructions test
|
||||
#
|
||||
# Define GEN_VMLINUX_H to generate vmlinux.h from the BTF.
|
||||
|
||||
# As per kernel Makefile, avoid funny character set dependencies
|
||||
unexport LC_ALL
|
||||
@@ -197,6 +199,7 @@ FLEX ?= flex
|
||||
BISON ?= bison
|
||||
STRIP = strip
|
||||
AWK = awk
|
||||
READELF ?= readelf
|
||||
|
||||
# include Makefile.config by default and rule out
|
||||
# non-config cases
|
||||
@@ -1061,7 +1064,7 @@ $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_
|
||||
ifdef BUILD_BPF_SKEL
|
||||
BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
|
||||
# Get Clang's default includes on this system, as opposed to those seen by
|
||||
# '-target bpf'. This fixes "missing" files on some architectures/distros,
|
||||
# '--target=bpf'. This fixes "missing" files on some architectures/distros,
|
||||
# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
|
||||
#
|
||||
# Use '-idirafter': Don't interfere with include mechanics except where the
|
||||
@@ -1084,8 +1087,44 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
|
||||
$(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
|
||||
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
|
||||
|
||||
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
|
||||
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) $(TOOLS_UAPI_INCLUDE) \
|
||||
# Paths to search for a kernel to generate vmlinux.h from.
|
||||
VMLINUX_BTF_ELF_PATHS ?= $(if $(O),$(O)/vmlinux) \
|
||||
$(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
|
||||
../../vmlinux \
|
||||
/boot/vmlinux-$(shell uname -r)
|
||||
|
||||
# Paths to BTF information.
|
||||
VMLINUX_BTF_BTF_PATHS ?= /sys/kernel/btf/vmlinux
|
||||
|
||||
# Filter out kernels that don't exist or without a BTF section.
|
||||
VMLINUX_BTF_ELF_ABSPATHS ?= $(abspath $(wildcard $(VMLINUX_BTF_ELF_PATHS)))
|
||||
VMLINUX_BTF_PATHS ?= $(shell for file in $(VMLINUX_BTF_ELF_ABSPATHS); \
|
||||
do \
|
||||
if [ -f $$file ] && ($(READELF) -S "$$file" | grep -q .BTF); \
|
||||
then \
|
||||
echo "$$file"; \
|
||||
fi; \
|
||||
done) \
|
||||
$(wildcard $(VMLINUX_BTF_BTF_PATHS))
|
||||
|
||||
# Select the first as the source of vmlinux.h.
|
||||
VMLINUX_BTF ?= $(firstword $(VMLINUX_BTF_PATHS))
|
||||
|
||||
ifeq ($(VMLINUX_H),)
|
||||
ifeq ($(VMLINUX_BTF),)
|
||||
$(error Missing bpftool input for generating vmlinux.h)
|
||||
endif
|
||||
endif
|
||||
|
||||
$(SKEL_OUT)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL)
|
||||
ifeq ($(VMLINUX_H),)
|
||||
$(QUIET_GEN)$(BPFTOOL) btf dump file $< format c > $@
|
||||
else
|
||||
$(Q)cp "$(VMLINUX_H)" $@
|
||||
endif
|
||||
|
||||
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) $(SKEL_OUT)/vmlinux.h | $(SKEL_TMP_OUT)
|
||||
$(QUIET_CLANG)$(CLANG) -g -O2 --target=bpf -Wall -Werror $(BPF_INCLUDE) $(TOOLS_UAPI_INCLUDE) \
|
||||
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@
|
||||
|
||||
$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
|
||||
|
||||
@@ -102,3 +102,23 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int arch_evsel__open_strerror(struct evsel *evsel, char *msg, size_t size)
|
||||
{
|
||||
if (!x86__is_amd_cpu())
|
||||
return 0;
|
||||
|
||||
if (!evsel->core.attr.precise_ip &&
|
||||
!(evsel->pmu_name && !strncmp(evsel->pmu_name, "ibs", 3)))
|
||||
return 0;
|
||||
|
||||
/* More verbose IBS errors. */
|
||||
if (evsel->core.attr.exclude_kernel || evsel->core.attr.exclude_user ||
|
||||
evsel->core.attr.exclude_hv || evsel->core.attr.exclude_idle ||
|
||||
evsel->core.attr.exclude_host || evsel->core.attr.exclude_guest) {
|
||||
return scnprintf(msg, size, "AMD IBS doesn't support privilege filtering. Try "
|
||||
"again without the privilege modifiers (like 'k') at the end.");
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -1524,7 +1524,7 @@ int cmd_daemon(int argc, const char **argv)
|
||||
if (argc) {
|
||||
if (!strcmp(argv[0], "start"))
|
||||
ret = __cmd_start(&__daemon, daemon_options, argc, argv);
|
||||
if (!strcmp(argv[0], "signal"))
|
||||
else if (!strcmp(argv[0], "signal"))
|
||||
ret = __cmd_signal(&__daemon, daemon_options, argc, argv);
|
||||
else if (!strcmp(argv[0], "stop"))
|
||||
ret = __cmd_stop(&__daemon, daemon_options, argc, argv);
|
||||
|
||||
@@ -62,7 +62,6 @@ int cmd_kallsyms(int argc, const char **argv)
|
||||
if (argc < 1)
|
||||
usage_with_options(kallsyms_usage, options);
|
||||
|
||||
symbol_conf.sort_by_name = true;
|
||||
symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);
|
||||
if (symbol__init(NULL) < 0)
|
||||
return -1;
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1676,7 +1676,6 @@ repeat:
|
||||
* See symbol__browser_index.
|
||||
*/
|
||||
symbol_conf.priv_size += sizeof(u32);
|
||||
symbol_conf.sort_by_name = true;
|
||||
}
|
||||
annotation_config__init(&report.annotation_opts);
|
||||
}
|
||||
|
||||
@@ -92,28 +92,28 @@
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles in aborted transactions.",
|
||||
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
|
||||
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_aborted_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_elision",
|
||||
"ScaleUnit": "1cycles / elision"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_transaction",
|
||||
"ScaleUnit": "1cycles / transaction"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles within a transaction region.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cycles",
|
||||
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_transactional_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
|
||||
@@ -1830,28 +1830,28 @@
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles in aborted transactions.",
|
||||
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
|
||||
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_aborted_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_elision",
|
||||
"ScaleUnit": "1cycles / elision"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_transaction",
|
||||
"ScaleUnit": "1cycles / transaction"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles within a transaction region.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cycles",
|
||||
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_transactional_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
|
||||
@@ -7,6 +7,14 @@
|
||||
"SampleAfterValue": "100003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
|
||||
"EventCode": "0x87",
|
||||
"EventName": "DECODE.LCP",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches",
|
||||
"EventCode": "0xAB",
|
||||
@@ -245,27 +253,34 @@
|
||||
"UMask": "0x2"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_64B.IFTAG_STALL",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_TAG.STALLS",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]",
|
||||
"CounterMask": "4",
|
||||
"EventCode": "0x79",
|
||||
"EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS",
|
||||
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
|
||||
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_OK]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x18"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]",
|
||||
"CounterMask": "1",
|
||||
"EventCode": "0x79",
|
||||
"EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS",
|
||||
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
|
||||
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_ANY]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x18"
|
||||
},
|
||||
@@ -296,6 +311,24 @@
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x8"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
|
||||
"CounterMask": "1",
|
||||
"EventCode": "0x79",
|
||||
"EventName": "IDQ.DSB_CYCLES_ANY",
|
||||
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x18"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
|
||||
"CounterMask": "4",
|
||||
"EventCode": "0x79",
|
||||
"EventName": "IDQ.DSB_CYCLES_OK",
|
||||
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x18"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path",
|
||||
"EventCode": "0x79",
|
||||
|
||||
@@ -361,10 +361,10 @@
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction.",
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
|
||||
"EventCode": "0x87",
|
||||
"EventName": "ILD_STALL.LCP",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
@@ -488,11 +488,11 @@
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.",
|
||||
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_OK]",
|
||||
"CounterMask": "4",
|
||||
"EventCode": "0xA8",
|
||||
"EventName": "LSD.CYCLES_4_UOPS",
|
||||
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector).",
|
||||
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
@@ -505,6 +505,15 @@
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]",
|
||||
"CounterMask": "4",
|
||||
"EventCode": "0xA8",
|
||||
"EventName": "LSD.CYCLES_OK",
|
||||
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOPS]",
|
||||
"SampleAfterValue": "2000003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of Uops delivered by the LSD.",
|
||||
"EventCode": "0xA8",
|
||||
|
||||
@@ -6606,7 +6606,7 @@
|
||||
"EventCode": "0x52",
|
||||
"EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS",
|
||||
"PerPkg": "1",
|
||||
"PublicDescription": "ad and bl messages were actually slotted into the same flit in paralle",
|
||||
"PublicDescription": "ad and bl messages were actually slotted into the same flit in parallel",
|
||||
"UMask": "0x8",
|
||||
"Unit": "M3UPI"
|
||||
},
|
||||
|
||||
@@ -2735,7 +2735,7 @@
|
||||
"EventCode": "0x81",
|
||||
"EventName": "UNC_M_WPQ_OCCUPANCY",
|
||||
"PerPkg": "1",
|
||||
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts?",
|
||||
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts.",
|
||||
"Unit": "iMC"
|
||||
},
|
||||
{
|
||||
|
||||
@@ -155,18 +155,18 @@
|
||||
"UMask": "0x21"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "All requests that miss L2 cache. This event is not supported on ICL and ICX products, only supported on RKL products.",
|
||||
"BriefDescription": "This event is deprecated.",
|
||||
"Deprecated": "1",
|
||||
"EventCode": "0x24",
|
||||
"EventName": "L2_RQSTS.MISS",
|
||||
"PublicDescription": "Counts all requests that miss L2 cache. This event is not supported on ICL and ICX products, only supported on RKL products.",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x3f"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "All L2 requests. This event is not supported on ICL and ICX products, only supported on RKL products.",
|
||||
"BriefDescription": "This event is deprecated.",
|
||||
"Deprecated": "1",
|
||||
"EventCode": "0x24",
|
||||
"EventName": "L2_RQSTS.REFERENCES",
|
||||
"PublicDescription": "Counts all L2 requests. This event is not supported on ICL and ICX products, only supported on RKL products.",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0xff"
|
||||
},
|
||||
|
||||
@@ -7,6 +7,14 @@
|
||||
"SampleAfterValue": "100003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
|
||||
"EventCode": "0x87",
|
||||
"EventName": "DECODE.LCP",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.",
|
||||
"CounterMask": "1",
|
||||
@@ -213,10 +221,10 @@
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss.",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]",
|
||||
"EventCode": "0x80",
|
||||
"EventName": "ICACHE_16B.IFDATA_STALL",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity.",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
@@ -237,10 +245,26 @@
|
||||
"UMask": "0x2"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_64B.IFTAG_STALL",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]",
|
||||
"EventCode": "0x80",
|
||||
"EventName": "ICACHE_DATA.STALLS",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_TAG.STALLS",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
|
||||
@@ -1516,28 +1516,28 @@
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles in aborted transactions.",
|
||||
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
|
||||
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_aborted_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_elision",
|
||||
"ScaleUnit": "1cycles / elision"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
|
||||
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_cycles_per_transaction",
|
||||
"ScaleUnit": "1cycles / transaction"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Percentage of cycles within a transaction region.",
|
||||
"MetricExpr": "cpu@cycles\\-t@ / cycles",
|
||||
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
|
||||
"MetricGroup": "transaction",
|
||||
"MetricName": "tsx_transactional_cycles",
|
||||
"ScaleUnit": "100%"
|
||||
|
||||
@@ -318,10 +318,10 @@
|
||||
"UMask": "0x40"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction.",
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
|
||||
"EventCode": "0x87",
|
||||
"EventName": "ILD_STALL.LCP",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
@@ -556,7 +556,7 @@
|
||||
"BriefDescription": "TMA slots wasted due to incorrect speculation by branch mispredictions",
|
||||
"EventCode": "0xa4",
|
||||
"EventName": "TOPDOWN.BR_MISPREDICT_SLOTS",
|
||||
"PublicDescription": "Number of TMA slots that were wasted due to incorrect speculation by branch mispredictions. This event estimates number of operations that were issued but not retired from the specualtive path as well as the out-of-order engine recovery past a branch misprediction.",
|
||||
"PublicDescription": "Number of TMA slots that were wasted due to incorrect speculation by branch mispredictions. This event estimates number of operations that were issued but not retired from the speculative path as well as the out-of-order engine recovery past a branch misprediction.",
|
||||
"SampleAfterValue": "10000003",
|
||||
"UMask": "0x8"
|
||||
},
|
||||
|
||||
@@ -7,6 +7,14 @@
|
||||
"SampleAfterValue": "100003",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
|
||||
"EventCode": "0x87",
|
||||
"EventName": "DECODE.LCP",
|
||||
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.",
|
||||
"CounterMask": "1",
|
||||
@@ -213,10 +221,10 @@
|
||||
"UMask": "0x1"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss.",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]",
|
||||
"EventCode": "0x80",
|
||||
"EventName": "ICACHE_16B.IFDATA_STALL",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity.",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
@@ -237,10 +245,26 @@
|
||||
"UMask": "0x2"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_64B.IFTAG_STALL",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]",
|
||||
"EventCode": "0x80",
|
||||
"EventName": "ICACHE_DATA.STALLS",
|
||||
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]",
|
||||
"SampleAfterValue": "500009",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
{
|
||||
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
|
||||
"EventCode": "0x83",
|
||||
"EventName": "ICACHE_TAG.STALLS",
|
||||
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
|
||||
"SampleAfterValue": "200003",
|
||||
"UMask": "0x4"
|
||||
},
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user