Commit Graph

77 Commits

Author SHA1 Message Date
Marcin Slusarz 64aab720bd i7core_edac: fix panic in udimm sysfs attributes registration
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.

  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
  BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
  IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
  RIP: 0010:[<ffffffff81330b36>]  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  (...)
  Call Trace:
   [<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
   [<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
   [<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
   [<ffffffff81428901>] i7core_probe+0xccf/0xec0
  RIP  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  ---[ end trace 20de320855b81d78 ]---
  Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:58 -07:00
Daniel J Blueman ab08937400 quiesce EDAC initialisation on desktop/mobile i7
Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-26 08:17:44 -07:00
Mauro Carvalho Chehab 2d95d8158b i7core_edac: Avoid doing multiple probes for the same card
As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.

This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-07-02 18:04:29 -03:00
Mauro Carvalho Chehab bda142890e i7core_edac: Properly discover the first QPI device
On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.

This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-07-02 18:04:05 -03:00
Mauro Carvalho Chehab 52707f918c i7core_edac: Better describe the supported devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:43:52 -03:00
Vernon Mauery bd9e19ca46 Add support for Westmere to i7core_edac driver
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:23:56 -03:00
Tony Luck d4d1ef4515 i7core_edac: don't free on success
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 14:47:31 -03:00
Mauro Carvalho Chehab ac1ececea9 i7core_edac: Add support for X5670
As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.

Those are the PCI ID's on this new chipset:

fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)

The PCI device 8086:2c70 is shown as:

fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)

So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 13:15:42 -03:00
Vernon Mauery 8a311e179e Always call i7core_[ur]dimm_check_mc_ecc_err
This fixes an error in function i7core_check_error

In commit ca9c90ba09 which converts the
driver to use double buffering, there is a change in the logic.  Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function.  The current
code checks to see if mce_count is 0 and then exits.

This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.

This fix allows the driver to work again on my Nehalem based blades
again.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 12:43:23 -03:00
Alexander Beregalov 2a6fae3267 i7core_edac: fix memory leak of i7core_dev
Free already allocated i7core_dev.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:20 -03:00
Jiri Slaby 71753e0141 EDAC: add __init to i7core_xeon_pci_fixup
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:19 -03:00
Mauro Carvalho Chehab 508fa179f8 i7core_edac: Fix wrong device id for channel 1 devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:31 -03:00
Mauro Carvalho Chehab f05da2f785 i7core: add support for Lynnfield alternate address
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:29 -03:00
Mauro Carvalho Chehab 52a2e4fc37 i7core_edac: Add initial support for Lynnfield
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:28 -03:00
Randy Dunlap 3b918c12df edac: fix i7core build
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.

drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Alan Cox 486dd09f12 edac: i7core_edac produces undefined behaviour on 32bit
Fix the shifts up

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab de06eeef58 i7core_edac: Use a more generic approach for probing PCI devices
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab fd3826549d i7core_edac: PCI device is called NONCORE, instead of NOCORE
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 321ece4dda i7core_edac: Fix ringbuffer maxsize
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 6e103be1c7 i7core_edac: First store, then increment
Fix ringbuffer store logic.

While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 4f87fad1d3 i7core_edac: Better parse "any" addrmask
Instead of accepting just "any", accept also "any\n"

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab ca9c90ba09 i7core_edac: Use a lockless ringbuffer
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab f338d73691 i7core_edac: Convert UDIMM error counters into a proper sysfs group
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:

/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2

For registered dimms, however, the error counters are already being
displayed at:
	/sys/devices/system/edac/mc/mc0/csrow*/ce_count

So, there's no need to add any extra sysfs nodes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab cc301b3ae3 edac: store/show methods for device groups weren't working
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab a5538e531f i7core_edac: Add support for sysfs addrmatch group
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:01 -03:00