Commit Graph

201 Commits

Author SHA1 Message Date
Aravind Gopalakrishnan 85a8885bd0 amd64_edac: Add support for newer F16h models
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-27 18:03:16 +01:00
Aravind Gopalakrishnan 9d0e8d8348 amd64_edac: Fix logic to determine channel for F15 M30h processors
Update current channel selection logic to include F15h, M30h memory
controllers.

Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low)
(Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf)

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07 15:01:19 +01:00
Borislav Petkov d1ea71cdc9 amd64_edac: Remove "amd64" prefix from static functions
No need for the namespace tagging there. Cleanup setup_pci_device while
at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:54:27 +01:00
Borislav Petkov df781d0386 amd64_edac: Simplify code around decode_bus_error
Drop wrapper function and prefixes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:29:44 +01:00
Rashika Kheria 79db57cef9 amd64_edac: Mark amd64_decode_bus_error as static
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.

It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:16:59 +01:00
Jingoo Han ba935f4097 EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:23:41 +01:00
Aravind Gopalakrishnan 7f3f5240ce amd64_edac: Fix condition to verify max channels allowed for F15 M30h
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition

	(channel > 4 || channel < 0)

works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.

Fix that.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:04:17 +01:00
Chen, Gong 10ef6b0dff bitops: Introduce a more generic BITMASK macro
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-21 15:12:01 -07:00
Aravind Gopalakrishnan 4fc06b3171 amd64_edac: Fix incorrect wraparounds
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 15:00:22 +02:00
Borislav Petkov 3f0aba4fc0 amd64_edac: Correct erratum 505 range
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.

Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 14:25:08 +02:00
Ingo Molnar c874b6ba55 Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS/EDAC updates from Boris Petkov:

 "An amd64_edac fix for single channel configurations + trivial cleanups
  courtesy of Jingoo Han."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:07:20 +02:00
Borislav Petkov a4b4bedce8 amd64_edac: Get rid of boot_cpu_data accesses
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:01:56 +02:00
Aravind Gopalakrishnan 18b94f66f9 amd64_edac: Add ECC decoding support for newer F15h models
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.

Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.

While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:

      * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
      * Erratum 637: not fixed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:00:10 +02:00
Borislav Petkov f0a56c4801 amd64_edac: Fix single-channel setups
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.

So always allocate two channels unconditionally.

This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.

Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-29 17:22:41 +02:00
Aravind Gopalakrishnan 94c1acf2c8 amd64_edac: Add Family 16h support
Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-19 12:46:50 +02:00
Mauro Carvalho Chehab 9713faecff EDAC: Merge mci.mem_is_per_rank with mci.csbased
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab 1eef128254 amd64_edac: Correct DIMM sizes
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:02 +01:00
Linus Torvalds 55529fa576 Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
 "Mostly AMD's side of EDAC.  It is basically a new family enablement
  stuff: AMD F16h MCE decoding enablement from Jacob Shin.  The rest is
  trivial cleanups."

* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  mpc85xx_edac: Fix typo
  EDAC, MCE, AMD: Remove unneeded exports
  EDAC, MCE, AMD: Add MCE decoding support for Family 16h
  EDAC, MCE, AMD: Make MC2 decoding per-family
  amd64_edac: Remove dead code
2013-02-20 11:28:50 -08:00
Borislav Petkov acc7fcb400 amd64_edac: Remove dead code
5e2af0c09e ("edac: Don't initialize csrow's first_page & friends when
not needed") removed useless initialization of variables but left in the
functions which did that. They're unused now so drop them.

Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-22 22:39:41 +01:00
Daniel J Blueman c7e5301a1b amd64_edac: Fix type usage in NB IDs and memory ranges
Use appropriate types for northbridge IDs and memory ranges. Mark
immutable data const and keep within compilation unit on related
structures.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com
[Boris: Drop arg change to node_to_amd_nb]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:18:00 +01:00
Daniel J Blueman e2c0bffea2 amd64_edac: Fix PCI function lookup
Fix locating sibling memory controller PCI functions by using the
correct PCI domain and use a northbridge descriptor only if found. We
need to at least warn if it wasn't found so that it gets fixed and we
don't go off with wrong results.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com
[Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:59 +01:00
Daniel J Blueman 8b84c8df38 x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
Change amd_get_nb_id to return u16 to support >255 memory controllers,
and related consistency fixes.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
Daniel J Blueman 772c3ff385 x86, AMD, NB: Add multi-domain support
Fix get_node_id to match northbridge IDs from the array of detected
ones, allowing multi-server support such as with Numascale's
NumaConnect, renaming to 'amd_get_node_id' for consistency.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
[Boris: shorten lines to fit 80 cols]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
Greg Kroah-Hartman 9b3c6e85c2 Drivers: edac: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:03 -08:00
Borislav Petkov 16a528ee39 EDAC: Fix csrow size reported in sysfs
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.

Fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:40 +01:00