When the Overflow MCi_STATUS bit is set, EDAC reports the lost error
with a "no information available" message which often puzzles users
parsing the dmesg. This doesn't make much sense since this error has
been lost anyway so no need for reporting it separately. Thus, report
the overflow bit setting in the MCE dump instead. While at it, remove
reporting of MiscV and ErrorEnable (en) which are superfluous.
Now it looks like this:
[ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
[ 1501.666887] Northbridge Error, node 2
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
of_device is just an alias for platform_device, so remove it entirely. Also
replace to_of_device() with to_platform_device() and update comment blocks.
This patch was initially generated from the following semantic patch, and then
edited by hand to pick up the bits that coccinelle didn't catch.
@@
@@
-struct of_device
+struct platform_device
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: David S. Miller <davem@davemloft.net>
EDAC MC3: CE page 0xc32281, offset 0x8a0, grain 0, syndrome 0x1, row 2, channel 1, label "": amd64_edac
EDAC MC3: CE - no information available: amd64_edacError Overflow
Add the missing space before "Error Overflow" on the second line.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Fortify the interface to not accept negative values, remove
memctrl_int_store() as a result. Also, sanitize bandwidth setting by
making the argument a simple u32 instead of strange u32 pointer being
passed around for no obvious reason. Then, fix error handling and teach
it to return proper error values. Finally, make code more readable,
simplify debug messages.
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
The correct check is to verify whether in high range we're below 4GB
and not to extract the DctSelBaseAddr again. See "2.8.5 Routing DRAM
Requests" in the F10h BKDG.
Cc: <stable@kernel.org> # .32.x .33.x .34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Switch to reusing the mcheck core's machine check polling mechanism
instead of duplicating functionality by using the EDAC polling routine.
Correct formatting while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
All F2x110-related bit defines are used at only one place so replace
them with simple BIT() macros.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
This option differs from EDAC_DEBUG only by printing the file and
line of where the debug statement is placed, which contains unneeded
information. So remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Remove the two syndrome extraction macros and add a single function
which does the same thing but with proper typechecking. While at it,
make sure to cache ECC syndrome size and dump it in debug output.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core:
MAINTAINERS: Add an entry for i7core_edac
i7core_edac: Avoid doing multiple probes for the same card
i7core_edac: Properly discover the first QPI device
As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.
This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.
This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
When calculating the DCT channel from the syndrome we need to know the
syndrome type (x4 vs x8). On F10h, this is read out from extended PCI
cfg space register F3x180 while on K8 we only support x4 syndromes and
don't have extended PCI config space anyway.
Make the code accessing F3x180 F10h only and fall back to x4 syndromes
on everything else.
Cc: <stable@kernel.org> # .33.x .34.x
Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>