Commit Graph

1369 Commits

Author SHA1 Message Date
Borislav Petkov 398443471f EDAC, mce_amd: Get rid of local var in amd_filter_mce()
... and use the macro for that.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:59:38 +02:00
Borislav Petkov f3c0891c2f EDAC, mce_amd: Get rid of most struct cpuinfo_x86 uses
struct mce.cpuid contains CPUID(1).EAX which contains family, model and
stepping and thus has enough information for our purposes. Thus get rid
of some external dependencies which are not really needed.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:54:57 +02:00
Borislav Petkov 4ab1784b48 EDAC, mce_amd: Rename decode_smca_errors() to decode_smca_error()
Singular fits better because it decodes a single error.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-21 17:44:09 +02:00
Bhumika Goyal b2b3e7362e EDAC: Make device_type const
Make these const as they are only stored in the type field of a device
structure, which is const.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-20 13:12:32 +02:00
Qiuxu Zhuo bc8f10babc EDAC, pnd2: Properly toggle hidden state for P2SB PCI device
Properly handle hidden state of P2SB PCI device (DEV:D, FUN:0) for
Apollo Lake.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170814154905.21707-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:51:11 +02:00
Qiuxu Zhuo 5fd77cb3ba EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BAR
On Deverton server, the P2SB PCI device (DEV:1F, FUN:1) is used by multiple
device drivers.

If it's hidden by some device driver (e.g. with the i801 I2C driver,
the commit

  9424693035 ("i2c: i801: Create iTCO device on newer Intel PCHs")

unconditionally hid the P2SB PCI device wrongly) it will make the
pnd2_edac driver read out an invalid BAR value of 0xffffffff and then
fail on ioremap().

Therefore, store the presence state of P2SB PCI device before unhiding
it for reading BAR and restore the presence state after reading BAR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-i2c@vger.kernel.org
Link: http://lkml.kernel.org/r/20170814154845.21663-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:47:24 +02:00
Qiuxu Zhuo d84676a9e1 EDAC, pnd2: Mask off the lower four bits of a BAR
Bit[0] of BAR is always zero. Bit[2:1] and bit[3] of BAR contain the
information of 'type' and the 'prefetchable' accordingly. Therefore,
mask the lower four bits to retrieve the actual base address of a BAR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170814154813.21619-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-19 10:33:30 +02:00
Christophe JAILLET 3eaef0fa39 EDAC, thunderx: Fix error handling path in thunderx_lmc_probe()
Return the proper error value if ioremap() fails (and not 0).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170816045821.14165-1-christophe.jaillet@wanadoo.fr
[ Massage commit message, remove newline. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18 19:12:08 +02:00
Christophe JAILLET 8b073d945c EDAC, altera: Fix error handling path in altr_edac_device_probe()
Return the proper error value if devm_ioremap() fails (and not 0).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170816050506.14541-1-christophe.jaillet@wanadoo.fr
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-18 18:21:27 +02:00
Tony Luck 3e5d2bd191 EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake
I've been waing a long time for the generic sideband driver to
appear. Patience has run out, so include the minimum here to
just read registers.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Patrick Geary <patrickg@supermicro.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170803210536.5662-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-04 05:58:23 +02:00
Qiuxu Zhuo 039d7af651 EDAC, sb_edac: Classify memory mirroring modes
Basically, there are full memory mirroring and address range partial
memory mirroring (supported by Haswell EX and Broadwell EX) modes.

a) In full memory mirroring, the memory behind each memory controller
   is mirrored, i.e. the memory is split into two identical mirrors
   (primary and secondary), half of the memory is reserved for redundancy.

b) In address range partial memory mirroring, the memory size (range)
   of primary and secondary behind each memory controller can be user
   defined by the TAD0 register. The rest of memory ranges defined by
   TAD1/TAD2/... in that memory controller are non-mirrored.

For more detail on memory mirroring, see the following link written by Tony Luck:

  https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux

Currently the sb_edac driver only supports address decoding in full
memory mirroring and non-mirroring modes. In address range partial
memory mirroring mode, it may fail to decode an address that falls in a
non-mirroring area (the following was one of this kind of failed logs).

  mce: Uncorrected hardware memory error in user-access at 566d53a400
  Memory failure: 0x566d53a: Killing einj_mem_uc:4647 due to hardware memory corruption
  Memory failure: 0x566d53a: recovery action for dirty LRU page: Recovered
  mce: [Hardware Error]: Machine check events logged
  EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
  EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090
  EDAC sbridge MC1: TSC 4b914aa5a99dab
  EDAC sbridge MC1: ADDR 566d53a400
  EDAC sbridge MC1: MISC 1443a0c86
  EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80
  EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32)
  mce: [Hardware Error]: Machine check events logged

Therefore, classify memory mirroring modes and make the address decoding
in address range partial memory mode correct.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-02 05:40:11 +02:00
Rob Herring 2efdda4a41 EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing of
the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170718214339.7774-19-robh@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-19 07:42:41 +02:00
Borislav Petkov c54182ec0e EDAC: Get rid of mci->mod_ver
It is a write-only variable so get rid of it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
2017-07-17 13:42:48 +02:00
Arvind Yadav 1c18be5a4e EDAC: Constify attribute_group structures
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
CC: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17 10:20:25 +02:00
Yazen Ghannam fbe63acf62 EDAC, mce_amd: Use cpu_to_node() to find the node ID
Using the homegrown amd_get_nb_id() to find a node ID on AMD was fine
while the L3 to node mapping was 1:1. And Zen topology broke this. So
let's start slowly moving away from it and use the topology interfaces
instead.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1490041614-90057-2-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17 07:01:08 +02:00
Tony Luck 164c29244d EDAC, pnd2: Fix Apollo Lake DIMM detection
Non-existent or empty DIMM slots result in error return from
RD_REGP(). But we shouldn't give up on failure.

So long as we find at least one DIMM we can continue.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:37:50 +02:00
Jérémy Lefaure a8c8261425 EDAC, i5000, i5400: Fix definition of NRECMEMB register
In the i5000 and i5400 drivers, the NRECMEMB register is defined as a
16-bit value, which results in wrong shifts in the code, as reported by
sparse.

In the datasheets ([1], section 3.9.22.20 and [2], section 3.9.22.21),
this register is a 32-bit register. A u32 value for the register fixes
the wrong shifts warnings and matches the datasheet.

Also fix the mask to access to the CAS bits [27:16] in the i5000 driver.

[1]: https://www.intel.com/content/dam/doc/datasheet/5000p-5000v-5000z-chipset-memory-controller-hub-datasheet.pdf
[2]: https://www.intel.se/content/dam/doc/datasheet/5400-chipset-memory-controller-hub-datasheet.pdf

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170629005729.8478-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:33:13 +02:00
Colin Ian King 77641dacea EDAC, pnd2: Make function sbi_send() static
The function sbi_send() is local to just pnd2_edac.c and does not need
to be in global scope, so make it static.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170623084855.9197-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-26 16:13:25 +02:00
Gustavo A. R. Silva ee514c7a23 EDAC, pnd2: Return proper error value from apl_rd_reg()
Add code comment to make it clear that the fall-through is intentional
and, OR ret with its previous value to avoid overwriting it so that
callers can check the correct return value.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170622220535.GA4896@embeddedgus
[ Massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-23 09:48:50 +02:00
Chris Packham ff0abed492 EDAC, altera: Simplify calculation of total memory
Use of_address_to_resource() and resource_size() instead of manually
parsing the "reg" property from the "memory" node(s).

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Tested-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170606235500.22772-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14 13:49:25 +02:00
Qiuxu Zhuo 133e4455c9 EDAC, sb_edac: Avoid creating SOCK memory controller
Xiaolong Ye reported the following failure on Broadwell D server:

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per
socket) use the same PCI device IDs for IMC0 per socket, then they
share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case,
Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller
and reports above error messages, since it has no IMC1 per socket.

Avoid creating the nonexistent SOCK memory controller.

Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com
[ Massage. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-14 11:53:39 +02:00
Yazen Ghannam bdf1bf1744 EDAC, mce_amd: Fix typo in SMCA error description
Fix typo in "poison consumption" error description.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1497286703-62853-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-12 19:03:55 +02:00
Chris Packham 3b405e30cb EDAC, mv64x60: Sanity check edac_op_state before registering
edac_op_state is a module parameter which affects the behaviour of
the driver probe which can potentially be invoked as soon as the
platform driver registration happens. Because of this we need to
ensure that we sanity check the module parameter before calling
platform_register_drivers().

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170607215530.8604-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-09 11:55:55 +02:00
Vadim Lomovtsev cf97825862 EDAC, thunderx: Fix a warning during l2c debugfs node creation
Compare the number of debugfs entries created by
thunderx_create_debugfs_nodes() with the requested number of entries to
properly determine whether to print a warning.

Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20170531155157.93583-1-stemerkhanov@cavium.com
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-01 20:08:17 +02:00
Chris Packham 7d2fdaa694 EDAC, mv64x60: Check driver registration success
Check the return status of platform_driver_register() in
mv64x60_edac_init(). Only output messages and initialise the
edac_op_state if the registration is successful.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170529212142.25572-2-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-30 09:55:51 +02:00