Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next

Pull more perf tools updates from Namhyung Kim:
 "These are remaining changes and fixes for this cycle.

  Build:

   - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
     and skip if the vmlinux has no BTF.

   - Replace deprecated clang -target xxx option by --target=xxx.

  perf record:

   - Print event attributes with well known type and config symbols in
     the debug output like below:

       # perf record -e cycles,cpu-clock -C0 -vv true
       <SNIP>
       ------------------------------------------------------------
       perf_event_attr:
         type                             0 (PERF_TYPE_HARDWARE)
         size                             136
         config                           0 (PERF_COUNT_HW_CPU_CYCLES)
         { sample_period, sample_freq }   4000
         sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
         read_format                      ID
         disabled                         1
         inherit                          1
         freq                             1
         sample_id_all                    1
         exclude_guest                    1
       ------------------------------------------------------------
       sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
       ------------------------------------------------------------
       perf_event_attr:
         type                             1 (PERF_TYPE_SOFTWARE)
         size                             136
         config                           0 (PERF_COUNT_SW_CPU_CLOCK)
         { sample_period, sample_freq }   4000
         sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
         read_format                      ID
         disabled                         1
         inherit                          1
         freq                             1
         sample_id_all                    1
         exclude_guest                    1

   - Update AMD IBS event error message since it now support per-process
     profiling but no priviledge filters.

       $ sudo perf record -e ibs_op//k -C 0
       Error:
       AMD IBS doesn't support privilege filtering. Try again without
       the privilege modifiers (like 'k') at the end.

  perf lock contention:

   - Support CSV style output using -x option

       $ sudo perf lock con -ab -x, sleep 1
       # output: contended, total wait, max wait, avg wait, type, caller
       19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
       15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
       4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
       1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
       8, 67608, 27404, 8451, spinlock, __queue_work+0x174
       3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
       3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
       2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
       1, 24619, 24619, 24619, spinlock, rcu_core+0xd4

   - Add --output option to save the data to a file not to be interfered
     by other debug messages.

  Test:

   - Fix event parsing test on ARM where there's no raw PMU nor supports
     PERF_PMU_CAP_EXTENDED_HW_TYPE.

   - Update the lock contention test case for CSV output.

   - Fix a segfault in the daemon command test.

  Vendor events (JSON):

   - Add has_event() to check if the given event is available on system
     at runtime. On Intel machines, some transaction events may not be
     present when TSC extensions are disabled.

   - Update Intel event metrics.

  Misc:

   - Sort symbols by name using an external array of pointers instead of
     a rbtree node in the symbol. This will save 16-bytes or 24-bytes
     per symbol whether the sorting is actually requested or not.

   - Fix unwinding DWARF callstacks using libdw when --symfs option is
     used"

* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
  perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
  perf test: Fix event parsing test on Arm
  perf evsel amd: Fix IBS error message
  perf: unwind: Fix symfs with libdw
  perf symbol: Fix uninitialized return value in symbols__find_by_name()
  perf test: Test perf lock contention CSV output
  perf lock contention: Add --output option
  perf lock contention: Add -x option for CSV style output
  perf lock: Remove stale comments
  perf vendor events intel: Update tigerlake to 1.13
  perf vendor events intel: Update skylakex to 1.31
  perf vendor events intel: Update skylake to 57
  perf vendor events intel: Update sapphirerapids to 1.14
  perf vendor events intel: Update icelakex to 1.21
  perf vendor events intel: Update icelake to 1.19
  perf vendor events intel: Update cascadelakex to 1.19
  perf vendor events intel: Update meteorlake to 1.03
  perf vendor events intel: Add rocketlake events/metrics
  perf vendor metrics intel: Make transaction metrics conditional
  perf jevents: Support for has_event function
  ...
This commit is contained in:
Linus Torvalds
2023-07-08 10:21:51 -07:00
95 changed files with 9319 additions and 451 deletions

View File

@@ -669,7 +669,7 @@ llvm.*::
"$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \
"-Wno-unused-value -Wno-pointer-sign " \
"-working-directory $WORKING_DIR " \
"-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
"-c \"$CLANG_SOURCE\" --target=bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
llvm.clang-opt::
Options passed to clang.

View File

@@ -36,6 +36,9 @@ COMMON OPTIONS
--input=<file>::
Input file name. (default: perf.data unless stdin is a fifo)
--output=<file>::
Output file name for perf lock contention and report.
-v::
--verbose::
Be more verbose (show symbol address, etc).
@@ -200,6 +203,11 @@ CONTENTION OPTIONS
Note that it matches the substring so 'rq' would match both 'raw_spin_rq_lock'
and 'irq_enter_rcu'.
-x::
--field-separator=<SEP>::
Show results using a CSV-style output to make it easy to import directly
into spreadsheets. Columns are separated by the string specified in SEP.
SEE ALSO
--------

View File

@@ -315,6 +315,9 @@ FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS)
FEATURE_CHECK_LDFLAGS-libaio = -lrt
FEATURE_CHECK_LDFLAGS-disassembler-four-args = -lbfd -lopcodes -ldl
FEATURE_CHECK_LDFLAGS-disassembler-init-styled = -lbfd -lopcodes -ldl
CORE_CFLAGS += -fno-omit-frame-pointer
CORE_CFLAGS += -ggdb3
CORE_CFLAGS += -funwind-tables
@@ -344,8 +347,8 @@ ifneq ($(TCMALLOC),)
endif
ifeq ($(FEATURES_DUMP),)
# We will display at the end of this Makefile.config, using $(call feature_display_entries),
# as we may retry some feature detection here.
# We will display at the end of this Makefile.config, using $(call feature_display_entries)
# As we may retry some feature detection here, see the disassembler-four-args case, for instance
FEATURE_DISPLAY_DEFERRED := 1
include $(srctree)/tools/build/Makefile.feature
else
@@ -680,6 +683,10 @@ ifdef BUILD_BPF_SKEL
CFLAGS += -DHAVE_BPF_SKEL
endif
ifndef GEN_VMLINUX_H
VMLINUX_H=$(src-perf)/util/bpf_skel/vmlinux/vmlinux.h
endif
dwarf-post-unwind := 1
dwarf-post-unwind-text := BUG
@@ -903,9 +910,13 @@ ifdef BUILD_NONDISTRO
ifeq ($(feature-libbfd-liberty), 1)
EXTLIBS += -lbfd -lopcodes -liberty
FEATURE_CHECK_LDFLAGS-disassembler-four-args += -liberty -ldl
FEATURE_CHECK_LDFLAGS-disassembler-init-styled += -liberty -ldl
else
ifeq ($(feature-libbfd-liberty-z), 1)
EXTLIBS += -lbfd -lopcodes -liberty -lz
FEATURE_CHECK_LDFLAGS-disassembler-four-args += -liberty -lz -ldl
FEATURE_CHECK_LDFLAGS-disassembler-init-styled += -liberty -lz -ldl
endif
endif
$(call feature_check,disassembler-four-args)
@@ -1329,6 +1340,6 @@ endif
# re-generate FEATURE-DUMP as we may have called feature_check, found out
# extra libraries to add to LDFLAGS of some other test and then redo those
# tests.
# tests, see the block about libbfd, disassembler-four-args, for instance.
$(shell rm -f $(FEATURE_DUMP_FILENAME))
$(foreach feat,$(FEATURE_TESTS),$(shell echo "$(call feature_assign,$(feat))" >> $(FEATURE_DUMP_FILENAME)))

View File

@@ -132,6 +132,8 @@ include ../scripts/utilities.mak
# Define EXTRA_TESTS to enable building extra tests useful mainly to perf
# developers, such as:
# x86 instruction decoder - new instructions test
#
# Define GEN_VMLINUX_H to generate vmlinux.h from the BTF.
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
@@ -197,6 +199,7 @@ FLEX ?= flex
BISON ?= bison
STRIP = strip
AWK = awk
READELF ?= readelf
# include Makefile.config by default and rule out
# non-config cases
@@ -1061,7 +1064,7 @@ $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_
ifdef BUILD_BPF_SKEL
BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
# Get Clang's default includes on this system, as opposed to those seen by
# '-target bpf'. This fixes "missing" files on some architectures/distros,
# '--target=bpf'. This fixes "missing" files on some architectures/distros,
# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
#
# Use '-idirafter': Don't interfere with include mechanics except where the
@@ -1084,8 +1087,44 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
$(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) $(TOOLS_UAPI_INCLUDE) \
# Paths to search for a kernel to generate vmlinux.h from.
VMLINUX_BTF_ELF_PATHS ?= $(if $(O),$(O)/vmlinux) \
$(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
../../vmlinux \
/boot/vmlinux-$(shell uname -r)
# Paths to BTF information.
VMLINUX_BTF_BTF_PATHS ?= /sys/kernel/btf/vmlinux
# Filter out kernels that don't exist or without a BTF section.
VMLINUX_BTF_ELF_ABSPATHS ?= $(abspath $(wildcard $(VMLINUX_BTF_ELF_PATHS)))
VMLINUX_BTF_PATHS ?= $(shell for file in $(VMLINUX_BTF_ELF_ABSPATHS); \
do \
if [ -f $$file ] && ($(READELF) -S "$$file" | grep -q .BTF); \
then \
echo "$$file"; \
fi; \
done) \
$(wildcard $(VMLINUX_BTF_BTF_PATHS))
# Select the first as the source of vmlinux.h.
VMLINUX_BTF ?= $(firstword $(VMLINUX_BTF_PATHS))
ifeq ($(VMLINUX_H),)
ifeq ($(VMLINUX_BTF),)
$(error Missing bpftool input for generating vmlinux.h)
endif
endif
$(SKEL_OUT)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL)
ifeq ($(VMLINUX_H),)
$(QUIET_GEN)$(BPFTOOL) btf dump file $< format c > $@
else
$(Q)cp "$(VMLINUX_H)" $@
endif
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) $(SKEL_OUT)/vmlinux.h | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 --target=bpf -Wall -Werror $(BPF_INCLUDE) $(TOOLS_UAPI_INCLUDE) \
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@
$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)

View File

@@ -102,3 +102,23 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
}
}
}
int arch_evsel__open_strerror(struct evsel *evsel, char *msg, size_t size)
{
if (!x86__is_amd_cpu())
return 0;
if (!evsel->core.attr.precise_ip &&
!(evsel->pmu_name && !strncmp(evsel->pmu_name, "ibs", 3)))
return 0;
/* More verbose IBS errors. */
if (evsel->core.attr.exclude_kernel || evsel->core.attr.exclude_user ||
evsel->core.attr.exclude_hv || evsel->core.attr.exclude_idle ||
evsel->core.attr.exclude_host || evsel->core.attr.exclude_guest) {
return scnprintf(msg, size, "AMD IBS doesn't support privilege filtering. Try "
"again without the privilege modifiers (like 'k') at the end.");
}
return 0;
}

View File

@@ -1524,7 +1524,7 @@ int cmd_daemon(int argc, const char **argv)
if (argc) {
if (!strcmp(argv[0], "start"))
ret = __cmd_start(&__daemon, daemon_options, argc, argv);
if (!strcmp(argv[0], "signal"))
else if (!strcmp(argv[0], "signal"))
ret = __cmd_signal(&__daemon, daemon_options, argc, argv);
else if (!strcmp(argv[0], "stop"))
ret = __cmd_stop(&__daemon, daemon_options, argc, argv);

View File

@@ -62,7 +62,6 @@ int cmd_kallsyms(int argc, const char **argv)
if (argc < 1)
usage_with_options(kallsyms_usage, options);
symbol_conf.sort_by_name = true;
symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);
if (symbol__init(NULL) < 0)
return -1;

File diff suppressed because it is too large Load Diff

View File

@@ -1676,7 +1676,6 @@ repeat:
* See symbol__browser_index.
*/
symbol_conf.priv_size += sizeof(u32);
symbol_conf.sort_by_name = true;
}
annotation_config__init(&report.annotation_opts);
}

View File

@@ -92,28 +92,28 @@
},
{
"BriefDescription": "Percentage of cycles in aborted transactions.",
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_aborted_cycles",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_elision",
"ScaleUnit": "1cycles / elision"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_transaction",
"ScaleUnit": "1cycles / transaction"
},
{
"BriefDescription": "Percentage of cycles within a transaction region.",
"MetricExpr": "cpu@cycles\\-t@ / cycles",
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_transactional_cycles",
"ScaleUnit": "100%"

View File

@@ -1830,28 +1830,28 @@
},
{
"BriefDescription": "Percentage of cycles in aborted transactions.",
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_aborted_cycles",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_elision",
"ScaleUnit": "1cycles / elision"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_transaction",
"ScaleUnit": "1cycles / transaction"
},
{
"BriefDescription": "Percentage of cycles within a transaction region.",
"MetricExpr": "cpu@cycles\\-t@ / cycles",
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_transactional_cycles",
"ScaleUnit": "100%"

View File

@@ -7,6 +7,14 @@
"SampleAfterValue": "100003",
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
"EventCode": "0x87",
"EventName": "DECODE.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches",
"EventCode": "0xAB",
@@ -245,27 +253,34 @@
"UMask": "0x2"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"EventCode": "0x83",
"EventName": "ICACHE_64B.IFTAG_STALL",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"EventCode": "0x83",
"EventName": "ICACHE_TAG.STALLS",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]",
"CounterMask": "1",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_ANY]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
@@ -296,6 +311,24 @@
"SampleAfterValue": "2000003",
"UMask": "0x8"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
"CounterMask": "1",
"EventCode": "0x79",
"EventName": "IDQ.DSB_CYCLES_ANY",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
"CounterMask": "4",
"EventCode": "0x79",
"EventName": "IDQ.DSB_CYCLES_OK",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path",
"EventCode": "0x79",

View File

@@ -361,10 +361,10 @@
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction.",
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
"EventCode": "0x87",
"EventName": "ILD_STALL.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@@ -488,11 +488,11 @@
"UMask": "0x1"
},
{
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.",
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0xA8",
"EventName": "LSD.CYCLES_4_UOPS",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector).",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@@ -505,6 +505,15 @@
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]",
"CounterMask": "4",
"EventCode": "0xA8",
"EventName": "LSD.CYCLES_OK",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Number of Uops delivered by the LSD.",
"EventCode": "0xA8",

View File

@@ -6606,7 +6606,7 @@
"EventCode": "0x52",
"EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS",
"PerPkg": "1",
"PublicDescription": "ad and bl messages were actually slotted into the same flit in paralle",
"PublicDescription": "ad and bl messages were actually slotted into the same flit in parallel",
"UMask": "0x8",
"Unit": "M3UPI"
},

View File

@@ -2735,7 +2735,7 @@
"EventCode": "0x81",
"EventName": "UNC_M_WPQ_OCCUPANCY",
"PerPkg": "1",
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts?",
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts.",
"Unit": "iMC"
},
{

View File

@@ -155,18 +155,18 @@
"UMask": "0x21"
},
{
"BriefDescription": "All requests that miss L2 cache. This event is not supported on ICL and ICX products, only supported on RKL products.",
"BriefDescription": "This event is deprecated.",
"Deprecated": "1",
"EventCode": "0x24",
"EventName": "L2_RQSTS.MISS",
"PublicDescription": "Counts all requests that miss L2 cache. This event is not supported on ICL and ICX products, only supported on RKL products.",
"SampleAfterValue": "200003",
"UMask": "0x3f"
},
{
"BriefDescription": "All L2 requests. This event is not supported on ICL and ICX products, only supported on RKL products.",
"BriefDescription": "This event is deprecated.",
"Deprecated": "1",
"EventCode": "0x24",
"EventName": "L2_RQSTS.REFERENCES",
"PublicDescription": "Counts all L2 requests. This event is not supported on ICL and ICX products, only supported on RKL products.",
"SampleAfterValue": "200003",
"UMask": "0xff"
},

View File

@@ -7,6 +7,14 @@
"SampleAfterValue": "100003",
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
"EventCode": "0x87",
"EventName": "DECODE.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
"SampleAfterValue": "500009",
"UMask": "0x1"
},
{
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.",
"CounterMask": "1",
@@ -213,10 +221,10 @@
"UMask": "0x1"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]",
"EventCode": "0x80",
"EventName": "ICACHE_16B.IFDATA_STALL",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity.",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]",
"SampleAfterValue": "500009",
"UMask": "0x4"
},
@@ -237,10 +245,26 @@
"UMask": "0x2"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"EventCode": "0x83",
"EventName": "ICACHE_64B.IFTAG_STALL",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]",
"EventCode": "0x80",
"EventName": "ICACHE_DATA.STALLS",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]",
"SampleAfterValue": "500009",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"EventCode": "0x83",
"EventName": "ICACHE_TAG.STALLS",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"SampleAfterValue": "200003",
"UMask": "0x4"
},

View File

@@ -1516,28 +1516,28 @@
},
{
"BriefDescription": "Percentage of cycles in aborted transactions.",
"MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles",
"MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_aborted_cycles",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of elisions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@",
"MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_elision",
"ScaleUnit": "1cycles / elision"
},
{
"BriefDescription": "Number of cycles within a transaction divided by the number of transactions.",
"MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@",
"MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_cycles_per_transaction",
"ScaleUnit": "1cycles / transaction"
},
{
"BriefDescription": "Percentage of cycles within a transaction region.",
"MetricExpr": "cpu@cycles\\-t@ / cycles",
"MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",
"MetricGroup": "transaction",
"MetricName": "tsx_transactional_cycles",
"ScaleUnit": "100%"

View File

@@ -318,10 +318,10 @@
"UMask": "0x40"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction.",
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
"EventCode": "0x87",
"EventName": "ILD_STALL.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
"SampleAfterValue": "500009",
"UMask": "0x1"
},
@@ -556,7 +556,7 @@
"BriefDescription": "TMA slots wasted due to incorrect speculation by branch mispredictions",
"EventCode": "0xa4",
"EventName": "TOPDOWN.BR_MISPREDICT_SLOTS",
"PublicDescription": "Number of TMA slots that were wasted due to incorrect speculation by branch mispredictions. This event estimates number of operations that were issued but not retired from the specualtive path as well as the out-of-order engine recovery past a branch misprediction.",
"PublicDescription": "Number of TMA slots that were wasted due to incorrect speculation by branch mispredictions. This event estimates number of operations that were issued but not retired from the speculative path as well as the out-of-order engine recovery past a branch misprediction.",
"SampleAfterValue": "10000003",
"UMask": "0x8"
},

View File

@@ -7,6 +7,14 @@
"SampleAfterValue": "100003",
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
"EventCode": "0x87",
"EventName": "DECODE.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
"SampleAfterValue": "500009",
"UMask": "0x1"
},
{
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.",
"CounterMask": "1",
@@ -213,10 +221,10 @@
"UMask": "0x1"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]",
"EventCode": "0x80",
"EventName": "ICACHE_16B.IFDATA_STALL",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity.",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]",
"SampleAfterValue": "500009",
"UMask": "0x4"
},
@@ -237,10 +245,26 @@
"UMask": "0x2"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"EventCode": "0x83",
"EventName": "ICACHE_64B.IFTAG_STALL",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]",
"EventCode": "0x80",
"EventName": "ICACHE_DATA.STALLS",
"PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]",
"SampleAfterValue": "500009",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"EventCode": "0x83",
"EventName": "ICACHE_TAG.STALLS",
"PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"SampleAfterValue": "200003",
"UMask": "0x4"
},

Some files were not shown because too many files have changed in this diff Show More