Add the ability to detect the specific data line or ECC line which failed
when printing out SDRAM single-bit errors. An example of a single-bit
SDRAM ECC error is below:
EDAC MPC85xx MC1: Err Detect Register: 0x80000004
EDAC MPC85xx MC1: Faulty data bit: 59
EDAC MPC85xx MC1: Expected Data / ECC: 0x7f80d000_409effa0 / 0x6d
EDAC MPC85xx MC1: Captured Data / ECC: 0x7780d000_409effa0 / 0x6d
EDAC MPC85xx MC1: Err addr: 0x00031ca0
EDAC MPC85xx MC1: PFN: 0x00000031
Knowning which specific data or ECC line caused an error can be useful in
tracking down hardware issues such as improperly terminated signals, loose
pins, etc.
Note that this feature is only currently enabled for 64-bit wide data
buses, 32-bit wide bus support should be added.
I don't have any 32-bit wide systems to test on. If someone has one and
is willing to give this patch a shot with the check for a 64-bit data bus
removed it would be much appreciated and I can re-submit with both 32 and
64 bit buses supported.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With a 64-bit wide data bus only the lowest 8-bits of the ECC syndrome are
relevant. With a 32-bit wide data bus only the lowest 16-bits are
relevant on most architectures.
Without this change, the ECC syndrome displayed can be mildly confusing,
eg:
EDAC MPC85xx MC1: syndrome: 0x25252525
When in reality the ECC syndrome is 0x25.
A variety of Freescale manuals say a variety of different things about how
to decode the CAPTURE_ECC (syndrome) register. I don't have a system with
a 32-bit bus to test on, but I believe the change is correct. It'd be
good to get an ACK from someone at Freescale about this change though.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No need for clearing ecc_enable_override and checking it in two places.
Instead, simply check it during probing and act accordingly. Also,
rename the flag bitfields according to the functionality they actually
represent. What is more, make sure original BIOS ECC settings are
restored when the module is unloaded.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add __percpu sparse annotations to places which didn't make it in one
of the previous patches. All converions are trivial.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Some unused, unsupported debug code existed in the mpc85xx EDAC driver
that resulted in a build failure when CONFIG_EDAC_DEBUG was defined:
drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe':
drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug'
drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function)
drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once
drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.)
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b484625172 ("edac: mpc85xx add
mpc83xx support") accidentally broke how a chip select's first and last
page addresses are calculated. The page addresses are being shifted too
far right by PAGE_SHIFT. This results in errors such as:
EDAC MPC85xx MC1: Err addr: 0x003075c0
EDAC MPC85xx MC1: PFN: 0x00000307
EDAC MPC85xx MC1: PFN out of range!
EDAC MC1: INTERNAL ERROR: row out of range (4 >= 4)
EDAC MC1: CE - no information available: INTERNAL ERROR
The vaule of PAGE_SHIFT is already being taken into consideration during
the calculation of the 'start' and 'end' variables, thus it is not
necessary to account for it again when setting a chip select's first and
last page address.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to
that, both bits 28 and 29 may be equal to one, returning channel = 3. As
this value is invalid, EDAC core generates the panic.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568
Signed-off-by: Tamas Vincze <tom@vincze.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a missing iterator variable thus fixing the conditional of the
for-loop in amd64_get_scrub_rate().
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Do not spam the logs needlessly with the sole info that
edac_pci_dev_parity_clear is being called.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Currently, the module does not initialize fully when the DIMMs aren't
ECC but remains still loaded. Propagate the error when no instance of
the driver is properly initialized and prevent further loading.
Reorganize and polish error handling in amd64_edac_init() while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Fix use-after-free errors by pushing all memory-freeing calls to the end
of amd64_remove_one_instance().
Reported-by: Darren Jenkins <darrenrjenkins@gmail.com>
LKML-Reference: <1261370306.11354.52.camel@ICE-BOX>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Although reporting of benign GART TLB errors is disabled in
__mcheck_cpu_apply_quirks, those are still being logged, and, as a
result, trip up amd64_edac. Pull up reporting check so that machines
with loaded edac module bail out early and don't spit fragments into
dmesg.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add support for 6 ranks per channel to the i5100 chipset. I have tested
the patch as far as possible with correctible errors and things appear
good. The DIMM mapping is correct for our board, but boards may differ.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Acked-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>