Commit Graph

52 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab 4af91889e0 i7core_edac: Avoid printing a warning when debug is disabled
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 4253868034 i7core_edac: We need to use list_for_each_entry_safe to avoid errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 22e6bcbdcf i7core_edac: change remove module strategy
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.

Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.

So, instead, it unregisters all devices at once, deleting them from the
device list.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 0f062792b4 i7core_edac: remove static counter for max sockets
The number of sockets is now fully dynamic. Get rid of this obsolete
var.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 13d6e9b653 i7core_edac: at remove, don't remove all pci devices at once
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab d88b85072f i7core_edac: Fix a bug when printing error counts with RDIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab d4c277957f i7core_edac: a few fixes for multiple mc's
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab 6c6aa3afdb i7core_edac: sanity check: print a warning if a mcelog is ignored
In thesis, the other mc controller should handle it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab f47429494f i7core_edac: create one mc per socket/QPI
Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).

This better reflects the Nehalem architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab 66607706ce Dynamically allocate memory for PCI devices
Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.

This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab a55456f344 i7core: temporary workaround to allow it to compile against 2.6.30
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab 3a3bb4a647 i7core_edac: Improve corrected_error_counts output for RDIMM
Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Keith Mannthey bc2d7245ff i7core_edac: Probe on Xeons eariler
On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called.  Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab 14d2c08343 i7core: Use registered memories per processor
Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab b4e8f0b6ea i7core_edac: Use Device 3 function 2 to report errors with RDIMM's
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Keith Mannthey 61053fdedb i7core_edac: Fix ecc enable shift
From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab 3ef288a983 i7core_edac: Print an error message if pci register fails
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab b990538a78 i7core_edac: CodingSyle fixes/cleanups
No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab 4157d9f554 i7core_edac: fix error injection
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab 2068def56c i7core_edac: fix error codes for sysfs error injection interface
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab 276b824c30 i7core_edac: some fixes at error injection code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab 17cb7b0cf7 i7core_edac: Some cleanups at displayed info
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab 086271a037 i7core: remove some uneeded noisy debug messages
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab 3a7dde7fcd i7core: add socket info at the debug msg
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab ec6df24c15 i7core: better document i7core_get_active_channels()
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00