Commit Graph

336039 Commits

Author SHA1 Message Date
Borislav Petkov d824c7718b MCE, AMD: Report decoded error type first
Instead of starting with the error details, report the decoded, readable
error type first.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:56:17 +01:00
Borislav Petkov f89f8388cd MCE, AMD: Dump CPU f/m/s triple with the error
It is very useful to have the family/model/stepping with the reported
error so dump it. This saves us asking the bug reporter about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:57 +01:00
Borislav Petkov f05c41a9c6 MCE, AMD: Remove functional unit references
Having the functional unit names in each bank decode is only misleading
as this code supports multiple families and there's no guarantee the
mapping between FUs and MCE banks will stay the same.

And also, knowing the functional unit name doesn't help much since you
end up looking at the respective BKDG anyway.

So drop all FU references and use the MC bank numbers instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:44 +01:00
Wei Yongjun db7312a295 EDAC: Convert to use simple_open()
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:22 +01:00
Wei Yongjun f35d852e80 EDAC, Calxeda highbank: Convert to use simple_open()
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:55:07 +01:00
Josh Hunt 3c0622760a EDAC: Fix mc size reported in sysfs
This is the complement to previous commit "EDAC: Fix csrow size
reported in sysfs". This fixes the memory controller size reporting on
csrow-based memory controllers. The csrow size is already combined for
both channels. Without this patch memory size is reported doubled.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:50 +01:00
Borislav Petkov 16a528ee39 EDAC: Fix csrow size reported in sysfs
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.

Fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:40 +01:00
Borislav Petkov 921a689965 EDAC: Pass mci parent
Initialize the mem_ctl_info descriptor of a csrow properly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:54:23 +01:00
Borislav Petkov 1165276917 EDAC: Add memory controller flags
The first flag is ->csbased and will be used in common EDAC code later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:48:04 +01:00
Borislav Petkov 10de6497a5 amd64_edac: Fix csrows size and pages computation
Make sure code pays attention to K8 having only one DCT, reformat and
cleanup code, correct debug messages, remove unused code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:47:36 +01:00
Borislav Petkov 0a5dfc3140 amd64_edac: Use DBAM_DIMM macro
Instead of open-coding it, use the DBAM_DIMM macro in
amd64_csrow_nr_pages() which we have already.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:46:19 +01:00
Borislav Petkov bb89f5a054 amd64_edac: Fix K8 chip select reporting
This basically reverts 603adaf6b3 ("amd64_edac: fix K8 chip select
reporting") because it was a clumsy workaround for DIMM sizes reporting
on K8 which got superceded by a much more correct one with 41d8bfaba7
("amd64_edac: Improve DRAM address mapping") without removing the prior
one. Remove it now finally.

Reported-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:46 +01:00
Borislav Petkov 33ca0643c9 amd64_edac: Reorganize error reporting path
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:34 +01:00
Borislav Petkov c8d1adf092 amd64_edac: Do not check whether error address is valid
All families report a valid error address when encountering a DRAM ECC
error so no need to check it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:11 +01:00
Borislav Petkov 66fed2d464 amd64_edac: Improve error injection
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine
does this by injecting the error in the next non-cached access. This
takes relatively long time on a normal system so that in order for us to
expedite it, we disable the caches around the injection.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:01 +01:00
Borislav Petkov 6e71a870b8 amd64_edac: Cleanup error injection code
Invert kstrtoul return value testing and win one indentation level.
Also, shorten up macro names so that the lines can fit into 80 cols. No
functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:44:35 +01:00
Borislav Petkov 1f31677e0d amd64_edac: Small fixlets and cleanups
amd64_get_dram_hole_info: remove local variable 'base'.
sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize
constants formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:44:12 +01:00
Borislav Petkov f430d5707a EDAC: Handle empty msg strings when reporting errors
A reported error could look like this

[  226.178315] EDAC MC0: 1 CE  on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)

with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:24:12 +01:00
Borislav Petkov 4da1b7bfe7 EDAC: Remove useless assignment of error type
The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:23:50 +01:00
Borislav Petkov 37929874d4 EDAC: Boundary-check edac_debug_level
Only levels [0:4] are allowed so enforce that. Also, while at it,
massage Kconfig text and add valid debug levels range to the module
parameter description.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:23:32 +01:00
Borislav Petkov 876bb331e2 EDAC: Respect operational state in edac_pci.c
Currently, we unconditionally enable PCI polling and we don't look at
the edac_op_state module parameter. Make this dependent on the parameter
setting supplied on the command line.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:22:47 +01:00
Linus Torvalds 9489e9dcae Linux 3.7-rc7 2012-11-25 17:59:19 -08:00
Linus Torvalds 08e627b5ce Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc EEH bugfixes from Benjamin Herrenschmidt.

Two one-liner fixes for the new EEH code.

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/eeh: Do not invalidate PE properly
  powerpc/pseries: Fix oops with MSIs when missing EEH PEs
2012-11-25 17:57:01 -08:00
Linus Torvalds c2a65d3d85 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Three issues fixed accross the field:

   - Some functions that were recently outlined as part of a preemption
     fix were causing problems with function tracing.
   - The recently merged in-kernel MPI library uses very outdated
     headers that contain MIPS-specific code which won't build on with
     gcc 4.4 or newer.
   - The MIPS non-NUMA memory initialization was making only a very
     half-baked attempt at merging adjacent memory ranges.  This kept
     the code simple enough but is now causing issues with kexec."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MPI: Fix compilation on MIPS with GCC 4.4 and newer
  MIPS: Fix crash that occurs when function tracing is enabled
  MIPS: Merge overlapping bootmem ranges
2012-11-25 17:55:04 -08:00
Gavin Shan e716e01438 powerpc/eeh: Do not invalidate PE properly
While the EEH does recovery on the specific PE that has PCI errors,
the PCI devices belonging to the PE will be removed and the PE will
be marked as invalid since we still need the information stored in
the PE. We only invalidate the PE when it doesn't have associated
EEH devices and valid child PEs. However, the code used to check
that is wrong. The patch fixes that.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-26 09:14:16 +11:00