Now that printing metric-value and metric-unit is optional,
print_running_json() shouldn't add the comma in case it becomes
trailing.
Replace all manual JSON comma stuff with a json_out() function that uses
the existing os->first tracking and auto inserts a comma if it's needed.
Update the test to handle that two of the fields can be missing.
This fixes the following test failure on Cortex A57 where the branch
misses metric is missing a required event:
$ perf test -vvv "json output"
106: perf stat JSON output linter:
--- start ---
test child forked, pid 665682
Checking json output: no args Test failed for input:
{"counter-value" : "3112.000000", "unit" : "",
"event" : "armv8_pmuv3_1/branch-misses/",
"event-runtime" : 20699340, "pcnt-running" : 100.00, }
...
json.decoder.JSONDecodeError: Expecting property name enclosed in
double quotes: line 12 column 144 (char 2109)
---- end(-1) ----
106: perf stat JSON output linter : FAILED!
Fixes: e1cc918b6c ("perf stat: Drop metric-unit if unit is NULL")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241112160048.951213-2-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On my system, perf list is very slow to print the whole events. I think
there's a performance issue in SDT and uprobes event listing. I noticed
this issue while running perf test on x86 but it takes long to check
some CoreSight event which should be skipped quickly.
Anyway, some test uses perf list to check whether the required event is
available before running the test. The perf list command can take an
argument to specify event class or (glob) pattern. But glob pattern is
only to suppress output for unmatched ones after checking all events.
In this case, specifying event class is better to reduce the number of
events it checks and to avoid buggy subsystems entirely.
No functional changes intended.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20241016065654.269994-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The getname_flags() routine changed recently and thus the place where we
were getting the pathname is not probeable anymore, albeit still
present, so use the next line for that, before:
root@number:/home/acme/git/perf-tools-next# perf test vfs_getname
91: Add vfs_getname probe to get syscall args filenames : FAILED!
93: Use vfs_getname probe to get syscall args filenames : FAILED!
126: Check open filename arg using perf trace + vfs_getname : FAILED!
root@number:/home/acme/git/perf-tools-next#
Now tests 91 and 126 are passing, some more investigation is needed for
test 93, that continues to fail.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].
perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]
On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles
S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
[1] commit c5e22feffd ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: james.clark@arm.com
Cc: 21cnbao@gmail.com
Cc: prime.zeng@hisilicon.com
Cc: Jonathan.Cameron@huawei.com
Cc: fanghao11@huawei.com
Cc: linuxarm@huawei.com
Cc: tim.c.chen@intel.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
Some shell tests depend on finding symbols for perf itself, and fail if
perf has been stripped and no debug object is available. Add helper
functions to check if perf has a needed symbol. This is preparation for
amending the tests themselves to be skipped if a needed symbol is not
found.
The functions make use of the "Symbols" test which reads and checks symbols
from a dso, perf itself by default. Note the "Symbols" test will find
symbols using the same method as other perf tests, including, for example,
looking in the buildid cache.
An alternative would be to prevent the needed symbols from being stripped,
which seems to work with gcc's externally_visible attribute, but that
attribute is not supported by clang.
Another alternative would be to use option -Wl,-E (which is already used
when perf is built with perl support) which causes the linker to add all
(global) symbols to the dynamic symbol table. Then the required symbols
need only be made global in scope to avoid being strippable. However that
goes beyond what is needed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231123075848.9652-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On AMD machines, the perf stat STD output test failed like below:
$ sudo ./perf test -v 98
98: perf stat STD output linter :
--- start ---
test child forked, pid 1841901
Checking STD output: no argswrong event metric.
expected 'GHz' in 108,121 stalled-cycles-frontend # 10.88% frontend cycles idle
test child finished with -1
---- end ----
perf stat STD output linter: FAILED!
This is because there are stalled-cycles-{frontend,backend} events are
used by default. The current logic checks the event_name array to find
which event it's running. But 'cycles' event comes before those stalled
cycles event and it matches first. So it tries to find 'GHz' metric
in the output (which is for the 'cycles') and fails.
Move the stalled-cycles-{frontend,backend} events before 'cycles' so
that it can find the stalled cycles events first.
Also add a space after 'no args' test name for consistency.
Fixes: 99a04a48f2 ("perf test: Add test case for the standard 'perf stat' output")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230623230139.985594-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The commit fc51fc87b1 factored out the helper functions to a library
but the new file had execute permission. Due to the way it detects
the shell test scripts, it showed up in the perf test list unexpectedly.
$ ./perf test list 2>&1 | grep 86
76: x86 bp modify
77: x86 Sample parsing
78: x86 hybrid
86: <---- (here)
$ ./perf test -v 86
86: :
--- start ---
test child forked, pid 1932207
test child finished with 0
---- end ----
: Ok
As it's a collection of library functions, it should not run as is.
Let's remove the execute permission.
Fixes: fc51fc87b1 ("perf test: Move all the check functions of stat CSV output to lib")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230622055832.83476-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>