Switch to reusing the mcheck core's machine check polling mechanism
instead of duplicating functionality by using the EDAC polling routine.
Correct formatting while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Remove the two syndrome extraction macros and add a single function
which does the same thing but with proper typechecking. While at it,
make sure to cache ECC syndrome size and dump it in debug output.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When calculating the DCT channel from the syndrome we need to know the
syndrome type (x4 vs x8). On F10h, this is read out from extended PCI
cfg space register F3x180 while on K8 we only support x4 syndromes and
don't have extended PCI config space anyway.
Make the code accessing F3x180 F10h only and fall back to x4 syndromes
on everything else.
Cc: <stable@kernel.org> # .33.x .34.x
Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
No need for clearing ecc_enable_override and checking it in two places.
Instead, simply check it during probing and act accordingly. Also,
rename the flag bitfields according to the functionality they actually
represent. What is more, make sure original BIOS ECC settings are
restored when the module is unloaded.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add __percpu sparse annotations to places which didn't make it in one
of the previous patches. All converions are trivial.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Add a missing iterator variable thus fixing the conditional of the
for-loop in amd64_get_scrub_rate().
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Currently, the module does not initialize fully when the DIMMs aren't
ECC but remains still loaded. Propagate the error when no instance of
the driver is properly initialized and prevent further loading.
Reorganize and polish error handling in amd64_edac_init() while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Fix use-after-free errors by pushing all memory-freeing calls to the end
of amd64_remove_one_instance().
Reported-by: Darren Jenkins <darrenrjenkins@gmail.com>
LKML-Reference: <1261370306.11354.52.camel@ICE-BOX>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.
This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.
Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The routine does the reverse mapping of the error address of a CECC back
to the node id, DRAM controller and chip select of the DIMM which caused
the error. We should lookup the channel using the syndromes _only_ when
the DCTs are ganged so fix that.
Also, add an early exit when there's an error while scanning for the
csrow thus decreasing indentation levels for better readability.
Finally, fixup comments.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Instead of using the whole syndrome tables for channel decoding, use a
set of eigenvectors with which the tables can be generated to search for
the syndrome in error. The algorithm operates independently of symbol
size and can be used for both x4 and x8 syndromes.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The .probe_valid_hardware low_ops member checked whether the DCTs are in
DDR3 mode and bailed out if so. Now that all the needed changes for DDR3
support is in place, remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.
This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10
and all K8 flavors and remove klugdy table of pseudo values. Add a
low_ops->dbam_to_cs member which is family-specific and replaces
low_ops->dbam_map_to_pages since the pages calculation is a one liner
now.
Further cleanups, while at it:
- shorten family name defines
- align amd64_family_types struct members
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>