Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Pull arm64 updates from Will Deacon:
"We've got a little less than normal thanks to the holidays in
December, but there's the usual summary below. The highlight is
probably the 52-bit physical addressing (LPA2) clean-up from Ard.
Confidential Computing:
- Register a platform device when running in CCA realm mode to enable
automatic loading of dependent modules
CPU Features:
- Update a bunch of system register definitions to pick up new field
encodings from the architectural documentation
- Add hwcaps and selftests for the new (2024) dpISA extensions
Documentation:
- Update EL3 (firmware) requirements for booting Linux on modern
arm64 designs
- Remove stale information about the kernel virtual memory map
Miscellaneous:
- Minor cleanups and typo fixes
Memory management:
- Fix vmemmap_check_pmd() to look at the PMD type bits
- LPA2 (52-bit physical addressing) cleanups and minor fixes
- Adjust physical address space depending upon whether or not LPA2 is
enabled
Perf and PMUs:
- Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
- Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
- Fix Designware PCIe PMU event numbering
- Add generic branch events for the Apple M1 CPU PMU
- Add support for Marvell Odyssey DDR and LLC-TAD PMUs
- Cleanups to the Hisilicon DDRC and Uncore PMU code
- Advertise discard mode for the SPE PMU
- Add the perf users mailing list to our MAINTAINERS entry"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
Documentation: arm64: Remove stale and redundant virtual memory diagrams
perf docs: arm_spe: Document new discard mode
perf: arm_spe: Add format option for discard mode
MAINTAINERS: Add perf list for drivers/perf/
arm64: Remove duplicate included header
drivers/perf: apple_m1: Map generic branch events
arm64: rsi: Add automatic arm-cca-guest module loading
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
arm64/hwcap: Describe 2024 dpISA extensions to userspace
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
arm64/mm: Replace open encodings with PXD_TABLE_BIT
arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
...
FEAT_SPEv1p2 (optional from Armv8.6) adds a discard mode that allows all
SPE data to be discarded rather than written to memory. Add a format
bit for this mode.
If the mode isn't supported, the format bit isn't published and attempts
to use it will result in -EOPNOTSUPP. Allocating an aux buffer is still
allowed even though it won't be written to so that old tools continue to
work, but updated tools can choose to skip this step.
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewd-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250108142904.401139-2-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Atish Patra <atishp@rivosinc.com> says:
Here are two minor improvement/fixes in the PMU event path. The first patch
was part of the series[1]. The 2nd patch was suggested during the series
review.
While the series can only be merged once SBI v3.0 is frozen, these two
patches can be independent of SBI v3.0 and can be merged sooner. Hence, these
two patches are sent as a separate series.
* b4-shazam-merge:
drivers/perf: riscv: Do not allow invalid raw event config
drivers/perf: riscv: Return error for default case
drivers/perf: riscv: Fix Platform firmware event data
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-0-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The SBI specification allows only lower 48bits of hpmeventX to be
configured via SBI PMU. Currently, the driver masks of the higher
bits but doesn't return an error. This will lead to an additional
SBI call for config matching which should return for an invalid
event error in most of the cases.
However, if a platform(i.e Rocket and sifive cores) implements a
bitmap of all bits in the event encoding this will lead to an
incorrect event being programmed leading to user confusion.
Report the error to the user if higher bits are set during the
event mapping itself to avoid the confusion and save an additional
SBI call.
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-3-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Platform firmware event data field is allowed to be 62 bits for
Linux as uppper most two bits are reserved to indicate SBI fw or
platform specific firmware events.
However, the event data field is masked as per the hardware raw
event mask which is not correct.
Fix the platform firmware event data field with proper mask.
Fixes: f0c9363db2 ("perf/riscv-sbi: Add platform specific firmware event handling")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-1-813e8a4f5962@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
For PMUs with no association, the hisi_pmu->on_cpu is initialized
according to the NUMA locality but use a wrong CPU for the interrupt
affinity. The CPU selected from cpumask_local_spread() can be different
from the CPU by the cpuhp callback. Fix this by setting the IRQ affinity
to hisi_pmu->on_cpu.
Fixes: 6cd137088f ("drivers/perf: hisi: Refactor the detection of associated CPUs")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20241223125134.57885-1-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
The imx93 is the first supported DDR PMU that supports read transaction,
write transaction and read beats events which corresponding respecitively
to counter 2, 3 and 4.
However, transaction-based AXI match has low accuracy when get total bits
compared to beats-based. And imx93 doesn't assign AXI_ID to each master.
So axi filter is not used widely on imx93. This could be regards as AXI
filter version 1.
To improve the AXI filter capability, imx95 supports 1 read beats and 3
write beats event which corresponding respecitively to counter 2-5. imx95
also detailed AXI_ID allocation so that most of the master could be count
individually. This could be regards as AXI filter version 2.
This will introduce AXI filter version to refactor the driver and support
better extension, such as coming imx943. This is also a potential fix on
imx91 when configure axi filter.
Fixes: 44798fe136 ("perf: imx_perf: add support for i.MX91 platform")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20241212065708.1353513-1-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
The group validation logic still somewhat assumes the original CMN-600
case of events counting globally, such that if one tries to group 9
events where the first 8 target a single DTC domain, the 9th will be
rejected because *a* DTC domain is full, even though it might only
target other non-overlapping domains and thus still be schedulable.
Improve matters by only counting the DTCs that the new event actually
needs (as arm_cmn_val_add_event() was already clever enough to do).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/bdfd1e58dac449e407c5cacfd6bf8577dc0a5899.1733943898.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
PCI Vendor-Specific (VSEC) Capabilities are defined by each vendor.
Devices from different vendors may advertise a VSEC Capability with the DWC
RAS DES functionality, but the vendors may assign different VSEC IDs.
Search for the DWC RAS DES Capability using the VSEC ID and VSEC Rev
chosen by the vendor.
This does not fix a current problem because Alibaba, Ampere, and Qualcomm
all assigned the same VSEC ID and VSEC Rev for the DWC RAS DES Capability.
The potential issue is that we may add support for a device from another
vendor, where the vendor has already assigned DWC_PCIE_VSEC_RAS_DES_ID
(0x02) for an unrelated VSEC. In that event, dwc_pcie_des_cap() would find
the unrelated VSEC and mistakenly assume it was a DWC RAS DES Capability.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-and-tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241209222938.3219364-1-helgaas@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Although the event of the uncore PMU can only be opened on a single
CPU, some PMU does have the affinity on a range of CPUs. For example
the L3C PMU is associated to the CPUs sharing the L3T it monitors.
Users may infer this affinity by the PMU name which may have SCCL ID
and CCL ID encoded (for L3C etc), but it's not that straightforward.
So export this information by adding an "associated_cpus" sysfs
attribute then user can get this directly.
Reviewed-by: Jonathan Cameron <Joanthan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-9-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Each type of HiSilicon Uncore PMU has the following sysfs attributes:
- format: bitmask in perf_event_attr::config[012] of corresponding
attribute
- event: events name and corresponding event code
- cpumask: range of CPUs the events can be opened on
- identifier: the version of this PMU
Different types of PMU have different implementations of the "format"
and "event" but all share the same implementation of the "cpumask"
and "identifier". Thus we can move cpumask and identifier to the
hisi_uncore_pmu framework and drivers can use the generic
implementation.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-8-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
HiSilicon Uncore PMUs are identified by the IDs of the topology element
on which the PMUs are located. Add a new separate struct hisi_pmu_toplogy
to encapsulate this information. Add additional documentation on the
meaning of each ID.
- make sccl_id and sicl_id into a union since they're exclusive. It can
also be accessed by scl_id if the SICL/SCCL distinction is not
relevant
- make index_id and sub_id signed so -1 may be used to indicate the PMU
doesn't have this topology element or it could not be retrieved.
This patch should have no functional changes.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-6-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
There are two type of PMUs supported currently:
1) PMUs locate on SCCL (Super CPU Cluster [1]), associated with certain
CCL (CPU cluster [1])(e.g. L3C PMU) or not (e.g. DDRC PMU)
2) PMUs locate on the SICL (Super IO Cluster [1]), which has no
association with certain CPU topology (e.g. CPA PMU)
Currently the associated CPUs of the PMU is detected in the cpuhp online
callback as below:
- for type 1) the CPUs match PMU's sccl_id and ccl_id
- for type 2) PMU's sccl_id is -1 and all online CPUs will be associated
Since uncore PMUs are not bound to certain CPU context and event could be
counting started by any online CPU, the associated CPUs are just a
preference. Below disadvantages are observed in current implementation:
- the PMU cannot be used if its associated CPUs are offline
- SICL PMUs are associated to all the online CPUs implicitly without
the consideration of locality
So refactor the way we detect the associated CPUs in below aspects:
- add a clear definition of hisi_pmu::associated_cpus
- initialize hisi_pmu::on_cpu based on locality if no associated CPU
found, otherwise update it from associated CPUs
- drop the detection with a sccl_id of -1 for SICL PMUs
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
If the selected CPU hisi_pmu::on_cpu goes offline, driver will select
a new online CPU from hisi_pmu::associated_cpus, or if no online CPU
found the PMU context won't be migrated. However for uncore PMUs the
associated CPUs are just a peference and it also works to schedule
the events on any online CPUs. So add a fallback to choose an online
CPU if no associated CPUs found.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Event will be scheduled on CPU of hisi_pmu::on_cpu which is selected
from the intersection of hisi_pmu::associated_cpus and online CPUs.
So the associated_cpus don't need to be maintained with online CPUs.
This will save one update operation if one associated CPU is offlined.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Each TAD provides eight 64-bit counters for monitoring
cache behavior.The driver always configures the same counter for
all the TADs. The user would end up effectively reserving one of
eight counters in every TAD to look across all TADs.
The occurrences of events are aggregated and presented to the user
at the end of running the workload. The driver does not provide a
way for the user to partition TADs so that different TADs are used for
different applications.
The performance events reflect various internal or interface activities.
By combining the values from multiple performance counters, cache
performance can be measured in terms such as: cache miss rate, cache
allocations, interface retry rate, internal resource occupancy, etc.
Each supported counter's event and formatting information is exposed
to sysfs at /sys/devices/tad/. Use perf tool stat command to measure
the pmu events. For instance:
perf stat -e tad_hit_ltg,tad_hit_dtg <workload>
Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-6-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
Odyssey DRAM Subsystem supports eight counters for monitoring performance
and software can program those counters to monitor any of the defined
performance events. Supported performance events include those counted
at the interface between the DDR controller and the PHY, interface between
the DDR Controller and the CHI interconnect, or within the DDR Controller.
Additionally DSS also supports two fixed performance event counters, one
for ddr reads and the other for ddr writes.
Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-4-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>