pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
Pull x86 RAS changes from Ingo Molnar:
- SCI reporting for other error types not only correctable ones
- GHES cleanups
- Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which,
if done properly in the BIOS, makes a corresponding EDAC module
redundant
- PCIe AER tracepoint severity levels fix
- Error path correction for the mce device init
- MCE timer fix
- Add more flexibility to the error injection (EINJ) debugfs interface
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mce: Fix mce_start_timer semantics
ACPI, APEI, GHES: Cleanup ghes memory error handling
ACPI, APEI: Cleanup alignment-aware accesses
ACPI, APEI, GHES: Do not report only correctable errors with SCI
ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
ACPI, eMCA: Combine eMCA/EDAC event reporting priority
EDAC, sb_edac: Modify H/W event reporting policy
EDAC: Add an edac_report parameter to EDAC
PCI, AER: Fix severity usage in aer trace event
x86, mce: Call put_device on device_register failure
Pull EDAC updates from Borislav Petkov:
- mpc85xx PCIe error interrupt support
- misc small enhancements/fixes all over the place.
* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Don't try to cancel workqueue when it's never setup
e752x_edac: Fix pci_dev usage count
sb_edac: Mark get_mci_for_node_id as static
EDAC: Mark edac_create_debug_nodes as static
amd64_edac: Remove "amd64" prefix from static functions
amd64_edac: Simplify code around decode_bus_error
amd64_edac: Mark amd64_decode_bus_error as static
EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
amd64_edac: Fix condition to verify max channels allowed for F15 M30h
edac/85xx: Add PCIe error interrupt edac support
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:
WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)
Fix it by skipping the workqueue teardown for such devices.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull RAS updates from Borislav Petkov:
* Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix this:
In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
printk(level "EDAC " prefix ": " fmt, ##arg)
^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
u64 ch_addr, offset, limit, prv = 0;
Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC driver updates from Mauro Carvalho Chehab:
- sb_edac: add support for Ivy Bridge support
- cell_edac: add a missing of_node_put() call
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
cell_edac: fix missing of_node_put
sb_edac: add support for Ivy Bridge
sb_edac: avoid decoding the same error multiple times
sb_edac: rename mci_bind_devs()
sb_edac: enable multiple PCI id tables to be used
sb_edac: rework sad_pkg
sb_edac: allow different interleave lists
sb_edac: allow different dram_rule arrays
sb_edac: isolate TOHM retrieval
sb_edac: rename pci_br
sb_edac: isolate TOLM retrieval
sb_edac: make RANK_CFG_A value part of sbridge_info
Pull EDAC updates from Borislav Petkov:
"Following up on last week's discussion, here's my part of the EDAC
pile, highlights in the signed tag.
The last two patches have a date from just now because I've just
applied them to the tree after Johannes sent them to me earlier. I
decided to forward them now because they're trivial.
There's a third one for MPC85xx which adds PCIe error interrupt
support but since it is not so trivial and hasn't seen any linux-next
time, I'm deferring it to 3.14
EDAC update highlights:
- Support for Calxeda ECX-2000 memory controller, from Robert Richter
- Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
Herring and Robert Richter
- New maintainer for Freescale's MPC85xx EDAC driver: Johannes
Thumshirn"
* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac/85xx: Remove mpc85xx_pci_err_remove
EDAC: Add edac-mpc85xx driver to MAINTAINERS
edac, highbank: Moving error injection to sysfs for edac
edac, highbank: Add MAINTAINERS entry
edac: Unify reporting of device info for device, mc and pci
edac, highbank: Improve and unify naming
edac, highbank: Add Calxeda ECX-2000 support
ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
edac, highbank: Fix interrupt setup of mem and l2 controller