2041 Commits

Author SHA1 Message Date
Komal Bajaj
c158647c10 EDAC/qcom: Correct interrupt enable register configuration
The previous implementation incorrectly configured the cmn_interrupt_2_enable
register for interrupt handling. Using cmn_interrupt_2_enable to configure
Tag, Data RAM ECC interrupts would lead to issues like double handling of the
interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured
for interrupts which needs to be handled by EL3.

EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure
Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.

Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Komal Bajaj <quic_kbajaj@quicinc.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20241119064608.12326-1-quic_kbajaj@quicinc.com
2025-02-14 20:36:11 +01:00
Linus Torvalds
b9d8a295ed Merge tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:

 - The first part of a restructuring of AMD's representation of a
   northbridge which is legacy now, and the creation of the new AMD node
   concept which represents the Zen architecture of having a collection
   of I/O devices within an SoC. Those nodes comprise the so-called data
   fabric on Zen.

   This has at least one practical advantage of not having to add a PCI
   ID each time a new data fabric PCI device releases. Eventually, the
   lot more uniform provider of data fabric functionality amd_node.c
   will be used by all the drivers which need it

 - Smaller cleanups

* tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_node: Use defines for SMN register offsets
  x86/amd_node: Remove dependency on AMD_NB
  x86/amd_node: Update __amd_smn_rw() error paths
  x86/amd_nb: Move SMN access code to a new amd_node driver
  x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
  x86/amd_nb: Simplify function 3 search
  x86/amd_nb: Use topology info to get AMD node count
  x86/amd_nb: Simplify root device search
  x86/amd_nb: Simplify function 4 search
  x86: Start moving AMD node functionality out of AMD_NB
  x86/amd_nb: Clean up early_is_amd_nb()
  x86/amd_nb: Restrict init function to AMD-based systems
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
2025-01-21 09:38:52 -08:00
Linus Torvalds
48795f90cb Merge tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:

 - Remove the less generic CPU matching infra around struct x86_cpu_desc
   and use the generic struct x86_cpu_id thing

 - Remove magic naked numbers for CPUID functions and use proper defines
   of the prefix CPUID_LEAF_*. Consolidate some of the crazy use around
   the tree

 - Smaller cleanups and improvements

* tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Make all all CPUID leaf names consistent
  x86/fpu: Remove unnecessary CPUID level check
  x86/fpu: Move CPUID leaf definitions to common code
  x86/tsc: Remove CPUID "frequency" leaf magic numbers.
  x86/tsc: Move away from TSC leaf magic numbers
  x86/cpu: Move TSC CPUID leaf definition
  x86/cpu: Refresh DCA leaf reading code
  x86/cpu: Remove unnecessary MwAIT leaf checks
  x86/cpu: Use MWAIT leaf definition
  x86/cpu: Move MWAIT leaf definition to common header
  x86/cpu: Remove 'x86_cpu_desc' infrastructure
  x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
  x86/cpu: Replace PEBS use of 'x86_cpu_desc' use with 'x86_cpu_id'
  x86/cpu: Expose only stepping min/max interface
  x86/cpu: Introduce new microcode matching helper
  x86/cpufeature: Document cpu_feature_enabled() as the default to use
  x86/paravirt: Remove the WBINVD callback
  x86/cpufeatures: Free up unused feature bits
2025-01-21 09:30:59 -08:00
Linus Torvalds
0763dd8928 Merge tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:

 - Remove the EDAC PowerPC Cell driver due to the removal of the IBM
   Cell blades support

 - Add a new EDAC driver for Loongson SoCs which reports single-bit
   correctable errors

 - Extend the SKX and i10NM EDAC drivers to support UV systems which can
   have more than 8 nodes

 - Add Intel Clearwater Forest server support to i10nm_edac

 - Minor fix

* tag 'edac_updates_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/cell: Remove powerpc Cell driver
  EDAC: Add an EDAC driver for the Loongson memory controller
  EDAC: Fix typos in comments
  EDAC/{i10nm,skx,skx_common}: Support UV systems
  EDAC/i10nm: Add Intel Clearwater Forest server support
2025-01-21 08:21:12 -08:00
Borislav Petkov (AMD)
368736db4d Merge remote-tracking branches 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
* ras/edac-drivers:
  EDAC/cell: Remove powerpc Cell driver
  EDAC: Add an EDAC driver for the Loongson memory controller
  EDAC/{i10nm,skx,skx_common}: Support UV systems
  EDAC/i10nm: Add Intel Clearwater Forest server support

* ras/edac-misc:
  EDAC: Fix typos in comments

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-01-17 19:36:27 +01:00
Michael Ellerman
6696037a56 EDAC/cell: Remove powerpc Cell driver
This driver can no longer be built since support for IBM Cell Blades was
removed, in particular PPC_CELL_COMMON.

Remove the driver.

  [ bp: Remove EDAC_CELL from Cell's defconfig too. ]

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241218105523.416573-23-mpe@ellerman.id.au
2025-01-16 17:07:50 +01:00
Mario Limonciello
d6caeafaa3 x86/amd_nb: Move SMN access code to a new amd_node driver
SMN access was bolted into amd_nb mostly as convenience.  This has
limitations though that require incurring tech debt to keep it working.

Move SMN access to the newly introduced AMD Node driver.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> # pdx86
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> # PMF, PMC
Link: https://lore.kernel.org/r/20241206161210.163701-11-yazen.ghannam@amd.com
2025-01-08 10:59:44 +01:00
Zhao Qunqin
558aff7a63 EDAC: Add an EDAC driver for the Loongson memory controller
Add ECC support for Loongson SoC DDR controller. This driver reports single
bit errors (CE) only.

Only ACPI firmware is supported.

  [ bp: Document what last_ce_count is for. ]

Signed-off-by: Zhao Qunqin <zhaoqunqin@loongson.cn>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20241219124846.1876-1-zhaoqunqin@loongson.cn
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-01-04 12:02:04 +01:00
Dave Hansen
85b08180df x86/cpu: Expose only stepping min/max interface
The x86_match_cpu() infrastructure can match CPU steppings. Since
there are only 16 possible steppings, the matching infrastructure goes
all out and stores the stepping match as a bitmap. That means it can
match any possible steppings in a single list entry. Fun.

But it exposes this bitmap to each of the X86_MATCH_*() helpers when
none of them really need a bitmap. It makes up for this by exporting a
helper (X86_STEPPINGS()) which converts a contiguous stepping range
into the bitmap which every single user leverages.

Instead of a bitmap, have the main helper for this sort of thing
(X86_MATCH_VFM_STEPS()) just take a stepping range. This ends up
actually being even more compact than before.

Leave the helper in place (renamed to __X86_STEPPINGS()) to make it
more clear what is going on instead of just having a random GENMASK()
in the middle of an already complicated macro.

One oddity that I hit was this macro:

       X86_MATCH_VFM_STEPS(vfm, X86_STEPPING_MIN, max_stepping, issues)

It *could* have been converted over to take a min/max stepping value
for each entry. But that would have been a bit too verbose and would
prevent the one oddball in the list (INTEL_COMETLAKE_L stepping 0)
from sticking out.

Instead, just have it take a *maximum* stepping and imply that the match
is from 0=>max_stepping. This is functional for all the cases now and
also retains the nice property of having INTEL_COMETLAKE_L stepping 0
stick out like a sore thumb.

skx_cpuids[] is goofy. It uses the stepping match but encodes all
possible steppings. Just use a normal, non-stepping match helper.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185129.65527B2A%40davehans-spike.ostc.intel.com
2024-12-17 16:14:49 -08:00
Yan Zhen
586e62fe38 EDAC: Fix typos in comments
Fix the following typos:

'Alocate' ==> 'Allocate',
'specifed' ==> 'specified',
'Technlogy' ==> 'Technology',
'Brnach' ==> 'Branch',
'branchs' ==> 'branches'.

Signed-off-by: Yan Zhen <yanzhen@vivo.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240930074023.618110-1-yanzhen@vivo.com
2024-12-15 22:17:34 +01:00
Kyle Meyer
584e09743d EDAC/{i10nm,skx,skx_common}: Support UV systems
The 3-bit source IDs in PCI configuration space registers, used to map
devices to sockets, are limited to 8 unique IDs, and each ID is local to
a UPI/QPI domain.

Source IDs cannot be used to map devices to sockets on UV systems
because they can exceed 8 sockets and have multiple UPI/QPI domains with
identical, repeating source IDs.

Use NUMA information to get package IDs instead of source IDs on UV
systems, and use package/source IDs to name IMC information structures.

Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/all/20241213012549.43099-1-kyle.meyer@hpe.com/
2024-12-13 11:10:31 -08:00
Borislav Petkov (AMD)
747367340c EDAC/amd64: Simplify ECC check on unified memory controllers
The intent of the check is to see whether at least one UMC has ECC
enabled. So do that instead of tracking which ones are enabled in masks
which are too small in size anyway and lead to not loading the driver on
Zen4 machines with UMCs enabled over UMC8.

Fixes: e2be5955a8 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
Reported-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Avadhut Naik <avadhut.naik@amd.com>
Reviewed-by: Avadhut Naik <avadhut.naik@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20241210212054.3895697-1-avadhut.naik@amd.com
2024-12-11 21:47:33 +01:00
Qiuxu Zhuo
2e55bb9b71 EDAC/i10nm: Add Intel Clearwater Forest server support
Clearwater Forest is the successor to Sierra Forest. Add Clearwater
Forest CPU model ID for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Link: https://lore.kernel.org/r/20241203022038.72873-1-qiuxu.zhuo@intel.com
2024-12-09 11:18:23 -08:00
Linus Torvalds
e70140ba0d Get rid of 'remove_new' relic from platform driver struct
The continual trickle of small conversion patches is grating on me, and
is really not helping.  Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:

  /*
   * .remove_new() is a relic from a prototype conversion of .remove().
   * New drivers are supposed to implement .remove(). Once all drivers are
   * converted to not use .remove_new any more, it will be dropped.
   */

This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.

I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.

Then I just removed the old (sic) .remove_new member function, and this
is the end result.  No more unnecessary conversion noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01 15:12:43 -08:00
Linus Torvalds
42d9e8b7cc Merge tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - Rework kfence support for the HPT MMU to work on systems with >= 16TB
   of RAM.

 - Remove the powerpc "maple" platform, used by the "Yellow Dog
   Powerstation".

 - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
   DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.

 - Add support for running KVM nested guests on Power11.

 - Other small features, cleanups and fixes.

Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa
Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert
Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard,
Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek,
Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao,
Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash,
Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen
Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum,
Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao.

* tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits)
  EDAC/powerpc: Remove PPC_MAPLE drivers
  powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
  powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
  docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
  powerpc/perf: Add perf interface to expose vpa counters
  MAINTAINERS: powerpc: Mark Maddy as "M"
  powerpc/Makefile: Allow overriding CPP
  powerpc-km82xx.c: replace of_node_put() with __free
  ps3: Correct some typos in comments
  powerpc/kexec: Fix return of uninitialized variable
  macintosh: Use common error handling code in via_pmu_led_init()
  powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type()
  powerpc: remove dead config options for MPC85xx platform support
  powerpc/xive: Use cpumask_intersects()
  selftests/powerpc: Remove the path after initialization.
  powerpc/xmon: symbol lookup length fixed
  powerpc/ep8248e: Use %pa to format resource_size_t
  powerpc/ps3: Reorganize kerneldoc parameter names
  KVM: PPC: Book3S HV: Fix kmv -> kvm typo
  powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
  ...
2024-11-23 10:44:31 -08:00
Linus Torvalds
c1f2ffe207 Merge tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2
   and report the Field Replaceable Unit text info reported through them

 - Add support for handling variable-sized SMCA BERT records

 - Add the capability for reporting vendor-specific RAS error info
   without adding vendor-specific fields to struct mce

 - Cleanups

* tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Add support for FRU text in MCA
  x86/mce/apei: Handle variable SMCA BERT record size
  x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
  tracing: Add __print_dynamic_array() helper
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
  x86/mce/mcelog: Use xchg() to get and clear the flags
2024-11-19 12:04:51 -08:00
Linus Torvalds
77286b868f Merge tag 'edac_updates_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:

 - Add support for Bluefield-2 SOCs to bluefield_edac

 - Add support for Intel Panther Lake-H to igen6_edac

 - Add polling support to igen6_edac as some Intel M100 chips have
   trouble with error interrupts

 - Add Kaby Lake-S support to ie31200_edac

 - Fix memory source detection in the SKX common module which is used by
   a couple of Intel EDAC drivers

 - Add support for the NXP i.MX9 memory controller to fsl_edac

 - The usual fixes and cleanups all over the place

* tag 'edac_updates_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: Add polling support
  EDAC/igen6: Initialize edac_op_state according to the configuration data
  EDAC/igen6: Avoid segmentation fault on module unload
  EDAC/ie31200: Add Kaby Lake-S dual-core host bridge ID
  MAINTAINERS: Change FSL DDR EDAC maintainership
  EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
  EDAC/skx_common: Differentiate memory error sources
  EDAC/fsl_ddr: Add support for i.MX9 DDR controller
  dt-bindings: memory: fsl: Add compatible string nxp,imx9-memory-controller
  EDAC/fsl_ddr: Fix bad bit shift operations
  EDAC/fsl_ddr: Move global variables into struct fsl_mc_pdata
  EDAC/fsl_ddr: Pass down fsl_mc_pdata in ddr_in32() and ddr_out32()
  RAS/AMD/ATL: Add debug prints for DF register reads
  EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
  EDAC/bluefield: Fix potential integer overflow
  EDAC/igen6: Add Intel Panther Lake-H SoCs support
2024-11-19 12:00:10 -08:00
Michael Ellerman
3c592ce799 EDAC/powerpc: Remove PPC_MAPLE drivers
These two drivers are only buildable for the powerpc "maple" platform
(CONFIG_PPC_MAPLE), which has now been removed, see
commit 62f8f307c8 ("powerpc/64: Remove maple platform").

Remove the drivers.

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241112084134.411964-1-mpe@ellerman.id.au
2024-11-19 16:41:16 +11:00
Borislav Petkov (AMD)
1b38da0115 Merge branch 'edac-misc' into edac-updates
* edac-misc:
  MAINTAINERS: Change FSL DDR EDAC maintainership
  RAS/AMD/ATL: Add debug prints for DF register reads
  EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
  EDAC/bluefield: Fix potential integer overflow
  EDAC/igen6: Add Intel Panther Lake-H SoCs support

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-11-18 11:33:23 +01:00
Orange Kao
e14232afa9 EDAC/igen6: Add polling support
Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4)
experienced issues with error interrupts not working, even with the
following configuration in the BIOS.

    In-Band ECC Support: Enabled
    In-Band ECC Operation Mode: 2 (make all requests protected and
                                   ignore range checks)
    IBECC Error Injection Control: Inject Correctable Error on insertion
                                   counter
    Error Injection Insertion Count: 251658240 (0xf000000)

Add polling mode support for these machines to ensure that memory error
events are handled.

Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-3-orange@aiven.io
2024-11-08 13:36:55 -08:00
Qiuxu Zhuo
1d512b1aa5 EDAC/igen6: Initialize edac_op_state according to the configuration data
Currently, igen6_edac sets edac_op_state to EDAC_OPSTATE_NMI, while the
driver also supports memory errors reported from Machine Check. Initialize
edac_op_state to the correct value according to the configuration data
that the driver probed.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-2-orange@aiven.io
2024-11-08 13:35:21 -08:00
Orange Kao
fefaae9039 EDAC/igen6: Avoid segmentation fault on module unload
The segmentation fault happens because:

During modprobe:
1. In igen6_probe(), igen6_pvt will be allocated with kzalloc()
2. In igen6_register_mci(), mci->pvt_info will point to
   &igen6_pvt->imc[mc]

During rmmod:
1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info)
2. In igen6_remove(), it will kfree(igen6_pvt);

Fix this issue by setting mci->pvt_info to NULL to avoid the double
kfree.

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360
Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
2024-11-04 12:09:45 -08:00
James Ye
f12c946ee7 EDAC/ie31200: Add Kaby Lake-S dual-core host bridge ID
Add device ID for dual-core Kaby Lake-S processors e.g. i3-7100.

Signed-off-by: James Ye <jye836@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Jason Baron <jbaron@akamai.com>
Link: https://lore.kernel.org/r/20240824120622.46226-1-jye836@gmail.com
2024-11-04 17:40:22 +01:00
Yazen Ghannam
612c2addff EDAC/mce_amd: Add support for FRU text in MCA
A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).

The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
multiple devices, for example, a Unified Memory Controller (UMC) bank
that manages two DIMMs.

  [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
  [ bp: Do not expose MCA_CONFIG to userspace yet. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
2024-10-31 10:53:04 +01:00
Avadhut Naik
d4fca1358e x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition to the
existing MCA_SYND register. The data within these registers is considered
valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like rasdaemon gather related hardware error
information through the tracepoints.

Therefore, export these two registers through the mce_record tracepoint so
that tools like rasdaemon can parse them and output the supplemental error
information like FRU text contained in them.

  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31 10:36:07 +01:00