Commit Graph

960 Commits

Author SHA1 Message Date
Linus Torvalds fab5669d55 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:

 - SCI reporting for other error types not only correctable ones

 - GHES cleanups

 - Add the functionality to override error reporting agents as some
   machines are sporting a new extended error logging capability which,
   if done properly in the BIOS, makes a corresponding EDAC module
   redundant

 - PCIe AER tracepoint severity levels fix

 - Error path correction for the mce device init

 - MCE timer fix

 - Add more flexibility to the error injection (EINJ) debugfs interface

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix mce_start_timer semantics
  ACPI, APEI, GHES: Cleanup ghes memory error handling
  ACPI, APEI: Cleanup alignment-aware accesses
  ACPI, APEI, GHES: Do not report only correctable errors with SCI
  ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
  ACPI, eMCA: Combine eMCA/EDAC event reporting priority
  EDAC, sb_edac: Modify H/W event reporting policy
  EDAC: Add an edac_report parameter to EDAC
  PCI, AER: Fix severity usage in aer trace event
  x86, mce: Call put_device on device_register failure
2014-01-20 12:10:27 -08:00
Linus Torvalds 09854dde94 Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
 - mpc85xx PCIe error interrupt support
 - misc small enhancements/fixes all over the place.

* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Don't try to cancel workqueue when it's never setup
  e752x_edac: Fix pci_dev usage count
  sb_edac: Mark get_mci_for_node_id as static
  EDAC: Mark edac_create_debug_nodes as static
  amd64_edac: Remove "amd64" prefix from static functions
  amd64_edac: Simplify code around decode_bus_error
  amd64_edac: Mark amd64_decode_bus_error as static
  EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
  amd64_edac: Fix condition to verify max channels allowed for F15 M30h
  edac/85xx: Add PCIe error interrupt edac support
2014-01-20 09:26:15 -08:00
Stephen Boyd 881f0fcef9 EDAC: Don't try to cancel workqueue when it's never setup
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:

WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)

Fix it by skipping the workqueue teardown for such devices.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-01-10 15:57:36 +01:00
Aristeu Rozanski 90ed4988b8 e752x_edac: Fix pci_dev usage count
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-21 13:26:57 +01:00
Ingo Molnar 014952270e Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS updates from Borislav Petkov:

  * Add the functionality to override error reporting agents as some
  machines are sporting a new extended error logging capability which, if
  done properly in the BIOS, makes a corresponding EDAC module redundant,
  from Gong Chen.

  * PCIe AER tracepoint severity levels fix, from Rui Wang.

  * Error path correction for the mce device init, from Levente Kurusa.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16 14:33:17 +01:00
Rashika Kheria 8112c0cdf7 sb_edac: Mark get_mci_for_node_id as static
This patch marks the function get_mci_for_node_id() as static because it
is not used outside of sb_edac.c.

Thus, it also eliminates the following warning:
drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 18:02:13 +01:00
Rashika Kheria 95285933d0 EDAC: Mark edac_create_debug_nodes as static
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.

Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:59:59 +01:00
Borislav Petkov d1ea71cdc9 amd64_edac: Remove "amd64" prefix from static functions
No need for the namespace tagging there. Cleanup setup_pci_device while
at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:54:27 +01:00
Borislav Petkov df781d0386 amd64_edac: Simplify code around decode_bus_error
Drop wrapper function and prefixes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:29:44 +01:00
Rashika Kheria 79db57cef9 amd64_edac: Mark amd64_decode_bus_error as static
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.

It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:16:59 +01:00
Chen, Gong fd52103966 EDAC, sb_edac: Modify H/W event reporting policy
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11 18:19:25 +01:00
Chen, Gong c700f013ad EDAC: Add an edac_report parameter to EDAC
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11 18:06:47 +01:00
Jingoo Han ba935f4097 EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:23:41 +01:00
Aravind Gopalakrishnan 7f3f5240ce amd64_edac: Fix condition to verify max channels allowed for F15 M30h
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition

	(channel > 4 || channel < 0)

works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.

Fix that.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06 10:04:17 +01:00
Aristeu Rozanski bd4b968362 sb_edac: Shut up compiler warning when EDAC_DEBUG is enabled
Fix this:

In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  printk(level "EDAC " prefix ": " fmt, ##arg)
        ^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
  u64   ch_addr, offset, limit, prv = 0;

Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30 12:26:36 +01:00
Chunhe Lan c92132f598 edac/85xx: Add PCIe error interrupt edac support
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and
p5020. The mpc85xx uses the legacy interrupt report mechanism - the
error interrupts are reported directly to mpic. While the p3041/
p4080/p5020 attaches the most of error interrupts to interrupt zero. And
report error interrupts to mpic via interrupt 0.

This patch can handle both of them.

Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-25 11:29:15 +01:00
Linus Torvalds cdd278db0e Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC driver updates from Mauro Carvalho Chehab:
 - sb_edac: add support for Ivy Bridge support
 - cell_edac: add a missing of_node_put() call

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  cell_edac: fix missing of_node_put
  sb_edac: add support for Ivy Bridge
  sb_edac: avoid decoding the same error multiple times
  sb_edac: rename mci_bind_devs()
  sb_edac: enable multiple PCI id tables to be used
  sb_edac: rework sad_pkg
  sb_edac: allow different interleave lists
  sb_edac: allow different dram_rule arrays
  sb_edac: isolate TOHM retrieval
  sb_edac: rename pci_br
  sb_edac: isolate TOLM retrieval
  sb_edac: make RANK_CFG_A value part of sbridge_info
2013-11-18 14:51:52 -08:00
Linus Torvalds 794e96e8ec Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
 "Following up on last week's discussion, here's my part of the EDAC
  pile, highlights in the signed tag.

  The last two patches have a date from just now because I've just
  applied them to the tree after Johannes sent them to me earlier.  I
  decided to forward them now because they're trivial.

  There's a third one for MPC85xx which adds PCIe error interrupt
  support but since it is not so trivial and hasn't seen any linux-next
  time, I'm deferring it to 3.14

  EDAC update highlights:
   - Support for Calxeda ECX-2000 memory controller, from Robert Richter
   - Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
     Herring and Robert Richter
   - New maintainer for Freescale's MPC85xx EDAC driver: Johannes
     Thumshirn"

* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  edac/85xx: Remove mpc85xx_pci_err_remove
  EDAC: Add edac-mpc85xx driver to MAINTAINERS
  edac, highbank: Moving error injection to sysfs for edac
  edac, highbank: Add MAINTAINERS entry
  edac: Unify reporting of device info for device, mc and pci
  edac, highbank: Improve and unify naming
  edac, highbank: Add Calxeda ECX-2000 support
  ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
  edac, highbank: Fix interrupt setup of mem and l2 controller
2013-11-18 14:50:17 -08:00
Johannes Thumshirn 0f1741c74a edac/85xx: Remove mpc85xx_pci_err_remove
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.

Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-17 20:05:50 +01:00
Libo Chen 3e45588825 cell_edac: fix missing of_node_put
Decrease device_node refcount np after task completion.

Signed-off-by: Libo Chen <libo.chen@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 17:13:43 -02:00
Aristeu Rozanski 4d715a805b sb_edac: add support for Ivy Bridge
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's
wiser to modify sb_edac to support both instead of creating another
driver.

[m.chehab@samsung.com: Fix CodingStyle]
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 17:13:22 -02:00
Aristeu Rozanski be3036d220 sb_edac: avoid decoding the same error multiple times
Whenever the extended error reporting is active, multiple MCEs will be
generated for the same event, which will lead to multiple repeated
errors to be reported. So check ADDRV and only decode the error if the
MCE address is valid.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:50:03 -02:00
Aristeu Rozanski ea779b5a09 sb_edac: rename mci_bind_devs()
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:47 -02:00
Aristeu Rozanski 5153a0f94c sb_edac: enable multiple PCI id tables to be used
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy
Bridge.

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:27 -02:00
Aristeu Rozanski cc311991a7 sb_edac: rework sad_pkg
This is in preparation for Ivy Bridge support

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14 16:49:04 -02:00