Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add a FRU (Field Replaceable Unit) memory poison manager which
   collects and manages previously encountered hw errors in order to
   save them to persistent storage across reboots. Previously recorded
   errors are "replayed" upon reboot in order to poison memory which has
   caused said errors in the past.

   The main use case is stacked, on-chip memory which cannot simply be
   replaced so poisoning faulty areas of it and thus making them
   inaccessible is the only strategy to prolong its lifetime.

 - Add an AMD address translation library glue which converts the
   reported addresses of hw errors into system physical addresses in
   order to be used by other subsystems like memory failure, for
   example. Add support for MI300 accelerators to that library.

 - igen6: Add support for Alder Lake-N SoC

 - i10nm: Add Grand Ridge support

 - The usual fixlets and cleanups

* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Convert to platform remove callback returning void
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()
This commit is contained in:
Linus Torvalds
2024-03-11 18:14:06 -07:00
33 changed files with 5165 additions and 336 deletions

View File

@@ -0,0 +1,24 @@
.. SPDX-License-Identifier: GPL-2.0
Address translation
===================
x86 AMD
-------
Zen-based AMD systems include a Data Fabric that manages the layout of
physical memory. Devices attached to the Fabric, like memory controllers,
I/O, etc., may not have a complete view of the system physical memory map.
These devices may provide a "normalized", i.e. device physical, address
when reporting memory errors. Normalized addresses must be translated to
a system physical address for the kernel to action on the memory.
AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for
this case.
Glossary of acronyms used in address translation for Zen-based systems
* CCM = Cache Coherent Moderator
* COD = Cluster-on-Die
* COH_ST = Coherent Station
* DF = Data Fabric

View File

@@ -1,15 +1,10 @@
.. SPDX-License-Identifier: GPL-2.0
Reliability, Availability and Serviceability features
=====================================================
This documents different aspects of the RAS functionality present in the
kernel.
Error decoding
---------------
==============
* x86
x86
---
Error decoding on AMD systems should be done using the rasdaemon tool:
https://github.com/mchehab/rasdaemon/

View File

@@ -0,0 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
.. toctree::
:maxdepth: 2
main
error-decoding
address-translation

View File

@@ -1,8 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
============================================
Reliability, Availability and Serviceability
============================================
==================================================
Reliability, Availability and Serviceability (RAS)
==================================================
This documents different aspects of the RAS functionality present in the
kernel.
RAS concepts
************

View File

@@ -122,7 +122,7 @@ configure specific aspects of kernel behavior to your liking.
pmf
pnp
rapidio
ras
RAS/index
rtc
serial-console
svga

View File

@@ -113,7 +113,6 @@ to ReStructured Text format, or are simply too old.
:maxdepth: 1
staging/index
RAS/ras
Translations

View File

@@ -897,6 +897,12 @@ Q: https://patchwork.kernel.org/project/linux-rdma/list/
F: drivers/infiniband/hw/efa/
F: include/uapi/rdma/efa-abi.h
AMD ADDRESS TRANSLATION LIBRARY (ATL)
M: Yazen Ghannam <Yazen.Ghannam@amd.com>
L: linux-edac@vger.kernel.org
S: Supported
F: drivers/ras/amd/atl/*
AMD AXI W1 DRIVER
M: Kris Chaplin <kris.chaplin@amd.com>
R: Thomas Delev <thomas.delev@amd.com>
@@ -7583,7 +7589,6 @@ R: Robert Richter <rric@kernel.org>
L: linux-edac@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
F: Documentation/admin-guide/ras.rst
F: Documentation/driver-api/edac.rst
F: drivers/edac/
F: include/linux/edac.h
@@ -18379,11 +18384,17 @@ M: Tony Luck <tony.luck@intel.com>
M: Borislav Petkov <bp@alien8.de>
L: linux-edac@vger.kernel.org
S: Maintained
F: Documentation/admin-guide/ras.rst
F: Documentation/admin-guide/RAS
F: drivers/ras/
F: include/linux/ras.h
F: include/ras/ras_event.h
RAS FRU MEMORY POISON MANAGER (FMPM)
M: Yazen Ghannam <Yazen.Ghannam@amd.com>
L: linux-edac@vger.kernel.org
S: Maintained
F: drivers/ras/amd/fmpm.c
RC-CORE / LIRC FRAMEWORK
M: Sean Young <sean@mess.org>
L: linux-media@vger.kernel.org

View File

@@ -224,7 +224,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
static inline int topology_max_smt_threads(void) { return 1; }
static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 0; };
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
#endif /* !CONFIG_SMP */
static inline void arch_fix_phys_package_id(int num, u32 slot)

View File

@@ -78,6 +78,7 @@ config EDAC_GHES
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64)"
depends on AMD_NB && EDAC_DECODE_MCE
imply AMD_ATL
help
Support for error detection and correction of DRAM ECC errors on
the AMD64 families (>= K8) of memory controllers.

View File

@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/ras.h>
#include "amd64_edac.h"
#include <asm/amd_nb.h>
@@ -1051,281 +1052,6 @@ static int fixup_node_id(int node_id, struct mce *m)
return nid - gpu_node_map.base_node_id + 1;
}
/* Protect the PCI config register pairs used for DF indirect access. */
static DEFINE_MUTEX(df_indirect_mutex);
/*
* Data Fabric Indirect Access uses FICAA/FICAD.
*
* Fabric Indirect Configuration Access Address (FICAA): Constructed based
* on the device's Instance Id and the PCI function and register offset of
* the desired register.
*
* Fabric Indirect Configuration Access Data (FICAD): There are FICAD LO
* and FICAD HI registers but so far we only need the LO register.
*
* Use Instance Id 0xFF to indicate a broadcast read.
*/
#define DF_BROADCAST 0xFF
static int __df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
{
struct pci_dev *F4;
u32 ficaa;
int err = -ENODEV;
if (node >= amd_nb_num())
goto out;
F4 = node_to_amd_nb(node)->link;
if (!F4)
goto out;
ficaa = (instance_id == DF_BROADCAST) ? 0 : 1;
ficaa |= reg & 0x3FC;
ficaa |= (func & 0x7) << 11;
ficaa |= instance_id << 16;
mutex_lock(&df_indirect_mutex);
err = pci_write_config_dword(F4, 0x5C, ficaa);
if (err) {
pr_warn("Error writing DF Indirect FICAA, FICAA=0x%x\n", ficaa);
goto out_unlock;
}
err = pci_read_config_dword(F4, 0x98, lo);
if (err)
pr_warn("Error reading DF Indirect FICAD LO, FICAA=0x%x.\n", ficaa);
out_unlock:
mutex_unlock(&df_indirect_mutex);
out:
return err;
}
static int df_indirect_read_instance(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
{
return __df_indirect_read(node, func, reg, instance_id, lo);
}
static int df_indirect_read_broadcast(u16 node, u8 func, u16 reg, u32 *lo)
{
return __df_indirect_read(node, func, reg, DF_BROADCAST, lo);
}
struct addr_ctx {
u64 ret_addr;
u32 tmp;
u16 nid;
u8 inst_id;
};
static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
{
u64 dram_base_addr, dram_limit_addr, dram_hole_base;
u8 die_id_shift, die_id_mask, socket_id_shift, socket_id_mask;
u8 intlv_num_dies, intlv_num_chan, intlv_num_sockets;
u8 intlv_addr_sel, intlv_addr_bit;
u8 num_intlv_bits, hashed_bit;
u8 lgcy_mmio_hole_en, base = 0;
u8 cs_mask, cs_id = 0;
bool hash_enabled = false;
struct addr_ctx ctx;
memset(&ctx, 0, sizeof(ctx));
/* Start from the normalized address */
ctx.ret_addr = norm_addr;
ctx.nid = nid;
ctx.inst_id = umc;
/* Read D18F0x1B4 (DramOffset), check if base 1 is used. */
if (df_indirect_read_instance(nid, 0, 0x1B4, umc, &ctx.tmp))
goto out_err;
/* Remove HiAddrOffset from normalized address, if enabled: */
if (ctx.tmp & BIT(0)) {
u64 hi_addr_offset = (ctx.tmp & GENMASK_ULL(31, 20)) << 8;
if (norm_addr >= hi_addr_offset) {
ctx.ret_addr -= hi_addr_offset;
base = 1;
}
}
/* Read D18F0x110 (DramBaseAddress). */
if (df_indirect_read_instance(nid, 0, 0x110 + (8 * base), umc, &ctx.tmp))
goto out_err;
/* Check if address range is valid. */
if (!(ctx.tmp & BIT(0))) {
pr_err("%s: Invalid DramBaseAddress range: 0x%x.\n",
__func__, ctx.tmp);
goto out_err;
}
lgcy_mmio_hole_en = ctx.tmp & BIT(1);
intlv_num_chan = (ctx.tmp >> 4) & 0xF;
intlv_addr_sel = (ctx.tmp >> 8) & 0x7;
dram_base_addr = (ctx.tmp & GENMASK_ULL(31, 12)) << 16;
/* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */
if (intlv_addr_sel > 3) {
pr_err("%s: Invalid interleave address select %d.\n",
__func__, intlv_addr_sel);
goto out_err;
}
/* Read D18F0x114 (DramLimitAddress). */
if (df_indirect_read_instance(nid, 0, 0x114 + (8 * base), umc, &ctx.tmp))
goto out_err;
intlv_num_sockets = (ctx.tmp >> 8) & 0x1;
intlv_num_dies = (ctx.tmp >> 10) & 0x3;
dram_limit_addr = ((ctx.tmp & GENMASK_ULL(31, 12)) << 16) | GENMASK_ULL(27, 0);
intlv_addr_bit = intlv_addr_sel + 8;
/* Re-use intlv_num_chan by setting it equal to log2(#channels) */
switch (intlv_num_chan) {
case 0: intlv_num_chan = 0; break;
case 1: intlv_num_chan = 1; break;
case 3: intlv_num_chan = 2; break;
case 5: intlv_num_chan = 3; break;
case 7: intlv_num_chan = 4; break;
case 8: intlv_num_chan = 1;
hash_enabled = true;
break;
default:
pr_err("%s: Invalid number of interleaved channels %d.\n",
__func__, intlv_num_chan);
goto out_err;
}
num_intlv_bits = intlv_num_chan;
if (intlv_num_dies > 2) {
pr_err("%s: Invalid number of interleaved nodes/dies %d.\n",
__func__, intlv_num_dies);
goto out_err;
}
num_intlv_bits += intlv_num_dies;
/* Add a bit if sockets are interleaved. */
num_intlv_bits += intlv_num_sockets;
/* Assert num_intlv_bits <= 4 */
if (num_intlv_bits > 4) {
pr_err("%s: Invalid interleave bits %d.\n",
__func__, num_intlv_bits);
goto out_err;
}
if (num_intlv_bits > 0) {
u64 temp_addr_x, temp_addr_i, temp_addr_y;
u8 die_id_bit, sock_id_bit, cs_fabric_id;
/*
* Read FabricBlockInstanceInformation3_CS[BlockFabricID].
* This is the fabric id for this coherent slave. Use
* umc/channel# as instance id of the coherent slave
* for FICAA.
*/
if (df_indirect_read_instance(nid, 0, 0x50, umc, &ctx.tmp))
goto out_err;
cs_fabric_id = (ctx.tmp >> 8) & 0xFF;
die_id_bit = 0;
/* If interleaved over more than 1 channel: */
if (intlv_num_chan) {
die_id_bit = intlv_num_chan;
cs_mask = (1 << die_id_bit) - 1;
cs_id = cs_fabric_id & cs_mask;
}
sock_id_bit = die_id_bit;
/* Read D18F1x208 (SystemFabricIdMask). */
if (intlv_num_dies || intlv_num_sockets)
if (df_indirect_read_broadcast(nid, 1, 0x208, &ctx.tmp))
goto out_err;
/* If interleaved over more than 1 die. */
if (intlv_num_dies) {
sock_id_bit = die_id_bit + intlv_num_dies;
die_id_shift = (ctx.tmp >> 24) & 0xF;
die_id_mask = (ctx.tmp >> 8) & 0xFF;
cs_id |= ((cs_fabric_id & die_id_mask) >> die_id_shift) << die_id_bit;
}
/* If interleaved over more than 1 socket. */
if (intlv_num_sockets) {
socket_id_shift = (ctx.tmp >> 28) & 0xF;
socket_id_mask = (ctx.tmp >> 16) & 0xFF;
cs_id |= ((cs_fabric_id & socket_id_mask) >> socket_id_shift) << sock_id_bit;
}
/*
* The pre-interleaved address consists of XXXXXXIIIYYYYY
* where III is the ID for this CS, and XXXXXXYYYYY are the
* address bits from the post-interleaved address.
* "num_intlv_bits" has been calculated to tell us how many "I"
* bits there are. "intlv_addr_bit" tells us how many "Y" bits
* there are (where "I" starts).
*/
temp_addr_y = ctx.ret_addr & GENMASK_ULL(intlv_addr_bit - 1, 0);
temp_addr_i = (cs_id << intlv_addr_bit);
temp_addr_x = (ctx.ret_addr & GENMASK_ULL(63, intlv_addr_bit)) << num_intlv_bits;
ctx.ret_addr = temp_addr_x | temp_addr_i | temp_addr_y;
}
/* Add dram base address */
ctx.ret_addr += dram_base_addr;
/* If legacy MMIO hole enabled */
if (lgcy_mmio_hole_en) {
if (df_indirect_read_broadcast(nid, 0, 0x104, &ctx.tmp))
goto out_err;
dram_hole_base = ctx.tmp & GENMASK(31, 24);
if (ctx.ret_addr >= dram_hole_base)
ctx.ret_addr += (BIT_ULL(32) - dram_hole_base);
}
if (hash_enabled) {
/* Save some parentheses and grab ls-bit at the end. */
hashed_bit = (ctx.ret_addr >> 12) ^
(ctx.ret_addr >> 18) ^
(ctx.ret_addr >> 21) ^
(ctx.ret_addr >> 30) ^
cs_id;
hashed_bit &= BIT(0);
if (hashed_bit != ((ctx.ret_addr >> intlv_addr_bit) & BIT(0)))
ctx.ret_addr ^= BIT(intlv_addr_bit);
}
/* Is calculated system address is above DRAM limit address? */
if (ctx.ret_addr > dram_limit_addr)
goto out_err;
*sys_addr = ctx.ret_addr;
return 0;
out_err:
return -EINVAL;
}
static int get_channel_from_ecc_syndrome(struct mem_ctl_info *, u16);
/*
@@ -3073,9 +2799,10 @@ static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
struct mem_ctl_info *mci;
unsigned long sys_addr;
struct amd64_pvt *pvt;
struct atl_err a_err;
struct err_info err;
u64 sys_addr;
node_id = fixup_node_id(node_id, m);
@@ -3106,7 +2833,12 @@ static void decode_umc_error(int node_id, struct mce *m)
pvt->ops->get_err_info(m, &err);
if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
a_err.addr = m->addr;
a_err.ipid = m->ipid;
a_err.cpu = m->extcpu;
sys_addr = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
if (IS_ERR_VALUE(sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}

View File

@@ -951,6 +951,7 @@ static const struct x86_cpu_id i10nm_cpuids[] = {
X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(EMERALDRAPIDS_X, X86_STEPPINGS(0x0, 0xf), &spr_cfg),
X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(GRANITERAPIDS_X, X86_STEPPINGS(0x0, 0xf), &gnr_cfg),
X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(ATOM_CRESTMONT_X, X86_STEPPINGS(0x0, 0xf), &gnr_cfg),
X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(ATOM_CRESTMONT, X86_STEPPINGS(0x0, 0xf), &gnr_cfg),
{}
};
MODULE_DEVICE_TABLE(x86cpu, i10nm_cpuids);

View File

@@ -238,6 +238,7 @@ static struct work_struct ecclog_work;
#define DID_ADL_N_SKU9 0x4678
#define DID_ADL_N_SKU10 0x4679
#define DID_ADL_N_SKU11 0x467c
#define DID_ADL_N_SKU12 0x4632
/* Compute die IDs for Raptor Lake-P with IBECC */
#define DID_RPL_P_SKU1 0xa706
@@ -583,6 +584,7 @@ static const struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_ADL_N_SKU9), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_N_SKU10), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_N_SKU11), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_ADL_N_SKU12), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU1), (kernel_ulong_t)&rpl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU2), (kernel_ulong_t)&rpl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU3), (kernel_ulong_t)&rpl_p_cfg },

View File

@@ -1324,11 +1324,9 @@ static int mc_probe(struct platform_device *pdev)
struct synps_edac_priv *priv;
struct mem_ctl_info *mci;
void __iomem *baseaddr;
struct resource *res;
int rc;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
baseaddr = devm_ioremap_resource(&pdev->dev, res);
baseaddr = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(baseaddr))
return PTR_ERR(baseaddr);

View File

@@ -42,8 +42,11 @@
#define ECCW0_FLIP_CTRL 0x109C
#define ECCW0_FLIP0_OFFSET 0x10A0
#define ECCW0_FLIP0_BITS 31
#define ECCW0_FLIP1_OFFSET 0x10A4
#define ECCW1_FLIP_CTRL 0x10AC
#define ECCW1_FLIP0_OFFSET 0x10B0
#define ECCW1_FLIP1_OFFSET 0x10B4
#define ECCR0_CERR_STAT_OFFSET 0x10BC
#define ECCR0_CE_ADDR_LO_OFFSET 0x10C0
#define ECCR0_CE_ADDR_HI_OFFSET 0x10C4
@@ -116,9 +119,6 @@
#define XDDR_BUS_WIDTH_32 1
#define XDDR_BUS_WIDTH_16 2
#define ECC_CEPOISON_MASK 0x1
#define ECC_UEPOISON_MASK 0x3
#define XDDR_MAX_ROW_CNT 18
#define XDDR_MAX_COL_CNT 10
#define XDDR_MAX_RANK_CNT 2
@@ -133,6 +133,7 @@
* https://docs.xilinx.com/r/en-US/am012-versal-register-reference/PCSR_LOCK-XRAM_SLCR-Register
*/
#define PCSR_UNLOCK_VAL 0xF9E8D7C6
#define PCSR_LOCK_VAL 1
#define XDDR_ERR_TYPE_CE 0
#define XDDR_ERR_TYPE_UE 1
@@ -142,6 +143,7 @@
#define XILINX_DRAM_SIZE_12G 3
#define XILINX_DRAM_SIZE_16G 4
#define XILINX_DRAM_SIZE_32G 5
#define NUM_UE_BITPOS 2
/**
* struct ecc_error_info - ECC error log information.
@@ -479,7 +481,7 @@ static void err_callback(const u32 *payload, void *data)
writel(regval, priv->ddrmc_baseaddr + XDDR_ISR_OFFSET);
/* Lock the PCSR registers */
writel(1, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
edac_dbg(3, "Total error count CE %d UE %d\n",
priv->ce_cnt, priv->ue_cnt);
}
@@ -650,7 +652,7 @@ static void enable_intr(struct edac_priv *priv)
writel(XDDR_IRQ_UE_MASK,
priv->ddrmc_baseaddr + XDDR_IRQ1_EN_OFFSET);
/* Lock the PCSR registers */
writel(1, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
}
static void disable_intr(struct edac_priv *priv)
@@ -663,7 +665,7 @@ static void disable_intr(struct edac_priv *priv)
priv->ddrmc_baseaddr + XDDR_IRQ_DIS_OFFSET);
/* Lock the PCSR registers */
writel(1, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
}
#define to_mci(k) container_of(k, struct mem_ctl_info, dev)
@@ -734,38 +736,63 @@ static void poison_setup(struct edac_priv *priv)
writel(regval, priv->ddrmc_noc_baseaddr + XDDR_NOC_REG_ADEC15_OFFSET);
}
static ssize_t xddr_inject_data_poison_store(struct mem_ctl_info *mci,
const char __user *data)
static void xddr_inject_data_ce_store(struct mem_ctl_info *mci, u8 ce_bitpos)
{
u32 ecc0_flip0, ecc1_flip0, ecc0_flip1, ecc1_flip1;
struct edac_priv *priv = mci->pvt_info;
writel(0, priv->ddrmc_baseaddr + ECCW0_FLIP0_OFFSET);
writel(0, priv->ddrmc_baseaddr + ECCW1_FLIP0_OFFSET);
if (strncmp(data, "CE", 2) == 0) {
writel(ECC_CEPOISON_MASK, priv->ddrmc_baseaddr +
ECCW0_FLIP0_OFFSET);
writel(ECC_CEPOISON_MASK, priv->ddrmc_baseaddr +
ECCW1_FLIP0_OFFSET);
if (ce_bitpos < ECCW0_FLIP0_BITS) {
ecc0_flip0 = BIT(ce_bitpos);
ecc1_flip0 = BIT(ce_bitpos);
ecc0_flip1 = 0;
ecc1_flip1 = 0;
} else {
writel(ECC_UEPOISON_MASK, priv->ddrmc_baseaddr +
ECCW0_FLIP0_OFFSET);
writel(ECC_UEPOISON_MASK, priv->ddrmc_baseaddr +
ECCW1_FLIP0_OFFSET);
ce_bitpos = ce_bitpos - ECCW0_FLIP0_BITS;
ecc0_flip1 = BIT(ce_bitpos);
ecc1_flip1 = BIT(ce_bitpos);
ecc0_flip0 = 0;
ecc1_flip0 = 0;
}
/* Lock the PCSR registers */
writel(1, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
return 0;
writel(ecc0_flip0, priv->ddrmc_baseaddr + ECCW0_FLIP0_OFFSET);
writel(ecc1_flip0, priv->ddrmc_baseaddr + ECCW1_FLIP0_OFFSET);
writel(ecc0_flip1, priv->ddrmc_baseaddr + ECCW0_FLIP1_OFFSET);
writel(ecc1_flip1, priv->ddrmc_baseaddr + ECCW1_FLIP1_OFFSET);
}
static ssize_t inject_data_poison_store(struct file *file, const char __user *data,
size_t count, loff_t *ppos)
/*
* To inject a correctable error, the following steps are needed:
*
* - Write the correctable error bit position value:
* echo <bit_pos val> > /sys/kernel/debug/edac/<controller instance>/inject_ce
*
* poison_setup() derives the row, column, bank, group and rank and
* writes to the ADEC registers based on the address given by the user.
*
* The ADEC12 and ADEC13 are mask registers; write 0 to make sure default
* configuration is there and no addresses are masked.
*
* The row, column, bank, group and rank registers are written to the
* match ADEC bit to generate errors at the particular address. ADEC14
* and ADEC15 have the match bits.
*
* xddr_inject_data_ce_store() updates the ECC FLIP registers with the
* bits to be corrupted based on the bit position given by the user.
*
* Upon doing a read to the address the errors are injected.
*/
static ssize_t inject_data_ce_store(struct file *file, const char __user *data,
size_t count, loff_t *ppos)
{
struct device *dev = file->private_data;
struct mem_ctl_info *mci = to_mci(dev);
struct edac_priv *priv = mci->pvt_info;
u8 ce_bitpos;
int ret;
ret = kstrtou8_from_user(data, count, 0, &ce_bitpos);
if (ret)
return ret;
/* Unlock the PCSR registers */
writel(PCSR_UNLOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
@@ -773,17 +800,110 @@ static ssize_t inject_data_poison_store(struct file *file, const char __user *da
poison_setup(priv);
xddr_inject_data_ce_store(mci, ce_bitpos);
ret = count;
/* Lock the PCSR registers */
writel(1, priv->ddrmc_noc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_noc_baseaddr + XDDR_PCSR_OFFSET);
xddr_inject_data_poison_store(mci, data);
return ret;
}
static const struct file_operations xddr_inject_ce_fops = {
.open = simple_open,
.write = inject_data_ce_store,
.llseek = generic_file_llseek,
};
static void xddr_inject_data_ue_store(struct mem_ctl_info *mci, u32 val0, u32 val1)
{
struct edac_priv *priv = mci->pvt_info;
writel(val0, priv->ddrmc_baseaddr + ECCW0_FLIP0_OFFSET);
writel(val0, priv->ddrmc_baseaddr + ECCW0_FLIP1_OFFSET);
writel(val1, priv->ddrmc_baseaddr + ECCW1_FLIP1_OFFSET);
writel(val1, priv->ddrmc_baseaddr + ECCW1_FLIP1_OFFSET);
}
/*
* To inject an uncorrectable error, the following steps are needed:
* echo <bit_pos val> > /sys/kernel/debug/edac/<controller instance>/inject_ue
*
* poison_setup() derives the row, column, bank, group and rank and
* writes to the ADEC registers based on the address given by the user.
*
* The ADEC12 and ADEC13 are mask registers; write 0 so that none of the
* addresses are masked. The row, column, bank, group and rank registers
* are written to the match ADEC bit to generate errors at the
* particular address. ADEC14 and ADEC15 have the match bits.
*
* xddr_inject_data_ue_store() updates the ECC FLIP registers with the
* bits to be corrupted based on the bit position given by the user. For
* uncorrectable errors
* 2 bit errors are injected.
*
* Upon doing a read to the address the errors are injected.
*/
static ssize_t inject_data_ue_store(struct file *file, const char __user *data,
size_t count, loff_t *ppos)
{
struct device *dev = file->private_data;
struct mem_ctl_info *mci = to_mci(dev);
struct edac_priv *priv = mci->pvt_info;
char buf[6], *pbuf, *token[2];
u32 val0 = 0, val1 = 0;
u8 len, ue0, ue1;
int i, ret;
len = min_t(size_t, count, sizeof(buf));
if (copy_from_user(buf, data, len))
return -EFAULT;
buf[len] = '\0';
pbuf = &buf[0];
for (i = 0; i < NUM_UE_BITPOS; i++)
token[i] = strsep(&pbuf, ",");
ret = kstrtou8(token[0], 0, &ue0);
if (ret)
return ret;
ret = kstrtou8(token[1], 0, &ue1);
if (ret)
return ret;
if (ue0 < ECCW0_FLIP0_BITS) {
val0 = BIT(ue0);
} else {
ue0 = ue0 - ECCW0_FLIP0_BITS;
val1 = BIT(ue0);
}
if (ue1 < ECCW0_FLIP0_BITS) {
val0 |= BIT(ue1);
} else {
ue1 = ue1 - ECCW0_FLIP0_BITS;
val1 |= BIT(ue1);
}
/* Unlock the PCSR registers */
writel(PCSR_UNLOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_UNLOCK_VAL, priv->ddrmc_noc_baseaddr + XDDR_PCSR_OFFSET);
poison_setup(priv);
xddr_inject_data_ue_store(mci, val0, val1);
/* Lock the PCSR registers */
writel(PCSR_LOCK_VAL, priv->ddrmc_noc_baseaddr + XDDR_PCSR_OFFSET);
writel(PCSR_LOCK_VAL, priv->ddrmc_baseaddr + XDDR_PCSR_OFFSET);
return count;
}
static const struct file_operations xddr_inject_enable_fops = {
static const struct file_operations xddr_inject_ue_fops = {
.open = simple_open,
.write = inject_data_poison_store,
.write = inject_data_ue_store,
.llseek = generic_file_llseek,
};
@@ -795,8 +915,17 @@ static void create_debugfs_attributes(struct mem_ctl_info *mci)
if (!priv->debugfs)
return;
edac_debugfs_create_file("inject_error", 0200, priv->debugfs,
&mci->dev, &xddr_inject_enable_fops);
if (!edac_debugfs_create_file("inject_ce", 0200, priv->debugfs,
&mci->dev, &xddr_inject_ce_fops)) {
debugfs_remove_recursive(priv->debugfs);
return;
}
if (!edac_debugfs_create_file("inject_ue", 0200, priv->debugfs,
&mci->dev, &xddr_inject_ue_fops)) {
debugfs_remove_recursive(priv->debugfs);
return;
}
debugfs_create_x64("address", 0600, priv->debugfs,
&priv->err_inject_addr);
mci->debugfs = priv->debugfs;
@@ -1031,7 +1160,7 @@ free_edac_mc:
return rc;
}
static int mc_remove(struct platform_device *pdev)
static void mc_remove(struct platform_device *pdev)
{
struct mem_ctl_info *mci = platform_get_drvdata(pdev);
struct edac_priv *priv = mci->pvt_info;
@@ -1049,8 +1178,6 @@ static int mc_remove(struct platform_device *pdev)
XPM_EVENT_ERROR_MASK_DDRMC_NCR, err_callback, mci);
edac_mc_del_mc(&pdev->dev);
edac_mc_free(mci);
return 0;
}
static struct platform_driver xilinx_ddr_edac_mc_driver = {
@@ -1059,7 +1186,7 @@ static struct platform_driver xilinx_ddr_edac_mc_driver = {
.of_match_table = xlnx_edac_match,
},
.probe = mc_probe,
.remove = mc_remove,
.remove_new = mc_remove,
};
module_platform_driver(xilinx_ddr_edac_mc_driver);

View File

@@ -32,5 +32,18 @@ menuconfig RAS
if RAS
source "arch/x86/ras/Kconfig"
source "drivers/ras/amd/atl/Kconfig"
config RAS_FMPM
tristate "FRU Memory Poison Manager"
default m
depends on AMD_ATL && ACPI_APEI
help
Support saving and restoring memory error information across reboot
using ACPI ERST as persistent storage. Error information is saved with
the UEFI CPER "FRU Memory Poison" section format.
Memory will be retired during boot time and run time depending on
platform-specific policies.
endif

View File

@@ -2,3 +2,6 @@
obj-$(CONFIG_RAS) += ras.o
obj-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RAS_CEC) += cec.o
obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
obj-y += amd/atl/

View File

@@ -0,0 +1,21 @@
# SPDX-License-Identifier: GPL-2.0-or-later
#
# AMD Address Translation Library Kconfig
#
# Copyright (c) 2023, Advanced Micro Devices, Inc.
# All Rights Reserved.
#
# Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
config AMD_ATL
tristate "AMD Address Translation Library"
depends on AMD_NB && X86_64 && RAS
depends on MEMORY_FAILURE
default N
help
This library includes support for implementation-specific
address translation procedures needed for various error
handling cases.
Enable this option if using DRAM ECC on Zen-based systems
and OS-based error handling.

View File

@@ -0,0 +1,18 @@
# SPDX-License-Identifier: GPL-2.0-or-later
#
# AMD Address Translation Library Makefile
#
# Copyright (c) 2023, Advanced Micro Devices, Inc.
# All Rights Reserved.
#
# Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
amd_atl-y := access.o
amd_atl-y += core.o
amd_atl-y += dehash.o
amd_atl-y += denormalize.o
amd_atl-y += map.o
amd_atl-y += system.o
amd_atl-y += umc.o
obj-$(CONFIG_AMD_ATL) += amd_atl.o

View File

@@ -0,0 +1,133 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* AMD Address Translation Library
*
* access.c : DF Indirect Access functions
*
* Copyright (c) 2023, Advanced Micro Devices, Inc.
* All Rights Reserved.
*
* Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
*/
#include "internal.h"
/* Protect the PCI config register pairs used for DF indirect access. */
static DEFINE_MUTEX(df_indirect_mutex);
/*
* Data Fabric Indirect Access uses FICAA/FICAD.
*
* Fabric Indirect Configuration Access Address (FICAA): constructed based
* on the device's Instance Id and the PCI function and register offset of
* the desired register.
*
* Fabric Indirect Configuration Access Data (FICAD): there are FICAD
* low and high registers but so far only the low register is needed.
*
* Use Instance Id 0xFF to indicate a broadcast read.
*/
#define DF_BROADCAST 0xFF
#define DF_FICAA_INST_EN BIT(0)
#define DF_FICAA_REG_NUM GENMASK(10, 1)
#define DF_FICAA_FUNC_NUM GENMASK(13, 11)
#define DF_FICAA_INST_ID GENMASK(23, 16)
#define DF_FICAA_REG_NUM_LEGACY GENMASK(10, 2)
static u16 get_accessible_node(u16 node)
{
/*
* On heterogeneous systems, not all AMD Nodes are accessible
* through software-visible registers. The Node ID needs to be
* adjusted for register accesses. But its value should not be
* changed for the translation methods.
*/
if (df_cfg.flags.heterogeneous) {
/* Only Node 0 is accessible on DF3.5 systems. */
if (df_cfg.rev == DF3p5)
node = 0;
/*
* Only the first Node in each Socket is accessible on
* DF4.5 systems, and this is visible to software as one
* Fabric per Socket. The Socket ID can be derived from
* the Node ID and global shift values.
*/
if (df_cfg.rev == DF4p5)
node >>= df_cfg.socket_id_shift - df_cfg.node_id_shift;
}
return node;
}
static int __df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
{
u32 ficaa_addr = 0x8C, ficad_addr = 0xB8;
struct pci_dev *F4;
int err = -ENODEV;
u32 ficaa = 0;
node = get_accessible_node(node);
if (node >= amd_nb_num())
goto out;
F4 = node_to_amd_nb(node)->link;
if (!F4)
goto out;
/* Enable instance-specific access. */
if (instance_id != DF_BROADCAST) {
ficaa |= FIELD_PREP(DF_FICAA_INST_EN, 1);
ficaa |= FIELD_PREP(DF_FICAA_INST_ID, instance_id);
}
/*
* The two least-significant bits are masked when inputing the
* register offset to FICAA.
*/
reg >>= 2;
if (df_cfg.flags.legacy_ficaa) {
ficaa_addr = 0x5C;
ficad_addr = 0x98;
ficaa |= FIELD_PREP(DF_FICAA_REG_NUM_LEGACY, reg);
} else {
ficaa |= FIELD_PREP(DF_FICAA_REG_NUM, reg);
}
ficaa |= FIELD_PREP(DF_FICAA_FUNC_NUM, func);
mutex_lock(&df_indirect_mutex);
err = pci_write_config_dword(F4, ficaa_addr, ficaa);
if (err) {
pr_warn("Error writing DF Indirect FICAA, FICAA=0x%x\n", ficaa);
goto out_unlock;
}
err = pci_read_config_dword(F4, ficad_addr, lo);
if (err)
pr_warn("Error reading DF Indirect FICAD LO, FICAA=0x%x.\n", ficaa);
pr_debug("node=%u inst=0x%x func=0x%x reg=0x%x val=0x%x",
node, instance_id, func, reg << 2, *lo);
out_unlock:
mutex_unlock(&df_indirect_mutex);
out:
return err;
}
int df_indirect_read_instance(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
{
return __df_indirect_read(node, func, reg, instance_id, lo);
}
int df_indirect_read_broadcast(u16 node, u8 func, u16 reg, u32 *lo)
{
return __df_indirect_read(node, func, reg, DF_BROADCAST, lo);
}

225
drivers/ras/amd/atl/core.c Normal file
View File

@@ -0,0 +1,225 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* AMD Address Translation Library
*
* core.c : Module init and base translation functions
*
* Copyright (c) 2023, Advanced Micro Devices, Inc.
* All Rights Reserved.
*
* Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
*/
#include <linux/module.h>
#include <asm/cpu_device_id.h>
#include "internal.h"
struct df_config df_cfg __read_mostly;
static int addr_over_limit(struct addr_ctx *ctx)
{
u64 dram_limit_addr;
if (df_cfg.rev >= DF4)
dram_limit_addr = FIELD_GET(DF4_DRAM_LIMIT_ADDR, ctx->map.limit);
else
dram_limit_addr = FIELD_GET(DF2_DRAM_LIMIT_ADDR, ctx->map.limit);
dram_limit_addr <<= DF_DRAM_BASE_LIMIT_LSB;
dram_limit_addr |= GENMASK(DF_DRAM_BASE_LIMIT_LSB - 1, 0);
/* Is calculated system address above DRAM limit address? */
if (ctx->ret_addr > dram_limit_addr) {
atl_debug(ctx, "Calculated address (0x%016llx) > DRAM limit (0x%016llx)",
ctx->ret_addr, dram_limit_addr);
return -EINVAL;
}
return 0;
}
static bool legacy_hole_en(struct addr_ctx *ctx)
{
u32 reg = ctx->map.base;
if (df_cfg.rev >= DF4)
reg = ctx->map.ctl;
return FIELD_GET(DF_LEGACY_MMIO_HOLE_EN, reg);
}
static int add_legacy_hole(struct addr_ctx *ctx)
{
u32 dram_hole_base;
u8 func = 0;
if (!legacy_hole_en(ctx))
return 0;
if (df_cfg.rev >= DF4)
func = 7;
if (df_indirect_read_broadcast(ctx->node_id, func, 0x104, &dram_hole_base))
return -EINVAL;
dram_hole_base &= DF_DRAM_HOLE_BASE_MASK;
if (ctx->ret_addr >= dram_hole_base)
ctx->ret_addr += (BIT_ULL(32) - dram_hole_base);
return 0;
}
static u64 get_base_addr(struct addr_ctx *ctx)
{
u64 base_addr;
if (df_cfg.rev >= DF4)
base_addr = FIELD_GET(DF4_BASE_ADDR, ctx->map.base);
else
base_addr = FIELD_GET(DF2_BASE_ADDR, ctx->map.base);
return base_addr << DF_DRAM_BASE_LIMIT_LSB;
}
static int add_base_and_hole(struct addr_ctx *ctx)
{
ctx->ret_addr += get_base_addr(ctx);
if (add_legacy_hole(ctx))
return -EINVAL;
return 0;
}
static bool late_hole_remove(struct addr_ctx *ctx)
{
if (df_cfg.rev == DF3p5)
return true;
if (df_cfg.rev == DF4)
return true;
if (ctx->map.intlv_mode == DF3_6CHAN)
return true;
return false;
}
unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr)
{
struct addr_ctx ctx;
if (df_cfg.rev == UNKNOWN)
return -EINVAL;
memset(&ctx, 0, sizeof(ctx));
/* Start from the normalized address */
ctx.ret_addr = addr;
ctx.inst_id = coh_st_inst_id;
ctx.inputs.norm_addr = addr;
ctx.inputs.socket_id = socket_id;
ctx.inputs.die_id = die_id;
ctx.inputs.coh_st_inst_id = coh_st_inst_id;
if (determine_node_id(&ctx, socket_id, die_id))
return -EINVAL;
if (get_address_map(&ctx))
return -EINVAL;
if (denormalize_address(&ctx))
return -EINVAL;
if (!late_hole_remove(&ctx) && add_base_and_hole(&ctx))
return -EINVAL;
if (dehash_address(&ctx))
return -EINVAL;
if (late_hole_remove(&ctx) && add_base_and_hole(&ctx))
return -EINVAL;
if (addr_over_limit(&ctx))
return -EINVAL;
return ctx.ret_addr;
}
static void check_for_legacy_df_access(void)
{
/*
* All Zen-based systems before Family 19h use the legacy
* DF Indirect Access (FICAA/FICAD) offsets.
*/
if (boot_cpu_data.x86 < 0x19) {
df_cfg.flags.legacy_ficaa = true;
return;
}
/* All systems after Family 19h use the current offsets. */
if (boot_cpu_data.x86 > 0x19)
return;
/* Some Family 19h systems use the legacy offsets. */
switch (boot_cpu_data.x86_model) {
case 0x00 ... 0x0f:
case 0x20 ... 0x5f:
df_cfg.flags.legacy_ficaa = true;
}
}
/*
* This library provides functionality for AMD-based systems with a Data Fabric.
* The set of systems with a Data Fabric is equivalent to the set of Zen-based systems
* and the set of systems with the Scalable MCA feature at this time. However, these
* are technically independent things.
*
* It's possible to match on the PCI IDs of the Data Fabric devices, but this will be
* an ever expanding list. Instead, match on the SMCA and Zen features to cover all
* relevant systems.
*/
static const struct x86_cpu_id amd_atl_cpuids[] = {
X86_MATCH_FEATURE(X86_FEATURE_SMCA, NULL),
X86_MATCH_FEATURE(X86_FEATURE_ZEN, NULL),
{ }
};
MODULE_DEVICE_TABLE(x86cpu, amd_atl_cpuids);
static int __init amd_atl_init(void)
{
if (!x86_match_cpu(amd_atl_cpuids))
return -ENODEV;
if (!amd_nb_num())
return -ENODEV;
check_for_legacy_df_access();
if (get_df_system_info())
return -ENODEV;
/* Increment this module's recount so that it can't be easily unloaded. */
__module_get(THIS_MODULE);
amd_atl_register_decoder(convert_umc_mca_addr_to_sys_addr);
pr_info("AMD Address Translation Library initialized");
return 0;
}
/*
* Exit function is only needed for testing and debug. Module unload must be
* forced to override refcount check.
*/
static void __exit amd_atl_exit(void)
{
amd_atl_unregister_decoder();
}
module_init(amd_atl_init);
module_exit(amd_atl_exit);
MODULE_LICENSE("GPL");

Some files were not shown because too many files have changed in this diff Show More