Commit Graph

1055 Commits

Author SHA1 Message Date
Linus Torvalds
e6b5be2be4 Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds
709d9f09b6 Merge tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac updates from Mauro Carvalho Chehab:
 - Broadwell-DE support on sb-edac driver
 - Some fixes at sb-edac driver

* tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  sb_edac: Fix typo computing number of banks
  sb_edac: Add support for Broadwell-DE processor
  sb_edac: Fix discovery of top-of-low-memory for Haswell
  sb_edac: Fix erroneous bytes->gigabytes conversion
  sb_edac: Fix off-by-one error in number of channels
2014-12-11 11:58:50 -08:00
Linus Torvalds
c9f861c772 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The biggest change in this cycle is better support for UCNA
  (UnCorrected No Action) events:

    "Handle all uncorrected error reports in the same way (soft
     offline the page). We used to only do that for SRAO
     (software recoverable action optional) machine checks, but
     it makes sense to also do it for UCNA (UnCorrected No
     Action) logs found by CMCI or polling."

  plus various x86 MCE handling updates and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Spell "panicked" correctly
  x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
  x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
  x86, MCE, AMD: Assign interrupt handler only when bank supports it
  x86, MCE, AMD: Drop software-defined bank in error thresholding
  x86, MCE, AMD: Move invariant code out from loop body
  x86, MCE, AMD: Correct thresholding error logging
  x86, MCE, AMD: Use macros to compute bank MSRs
  RAS, HWPOISON: Fix wrong error recovery status
  GHES: Make ghes_estatus_caches static
  APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-10 14:20:10 -08:00
Linus Torvalds
0160928e79 Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
 "EDAC updates all over the place:

   - Enablement for AMD F15h models 0x60 CPUs.  Most notably DDR4 RAM
     support.  Out of tree stuff is adding the required PCI IDs.  From
     Aravind Gopalakrishnan.

   - Enable amd64_edac for 32-bit due to popular demand.  From Tomasz
     Pala.

   - Convert the AMD MCE injection module to debugfs, where it belongs.

   - Misc EDAC cleanups"

* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE, AMD: Correct formatting of decoded text
  EDAC, mce_amd_inj: Add an injector function
  EDAC, mce_amd_inj: Add hw-injection attributes
  EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
  EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
  EDAC: Delete unnecessary check before calling pci_dev_put()
  EDAC, pci_sysfs: remove unneccessary ifdef around entire file
  ghes_edac: Use snprintf() to silence a static checker warning
  amd64_edac: Build module on x86-32
  EDAC, MCE, AMD: Add decoding table for MC6 xec
  amd64_edac: Add F15h M60h support
  {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
  EDAC: Sync memory types and names
  EDAC: Add DDR3 LRDIMM entries to edac_mem_types
  x86, amd_nb: Add device IDs to NB tables for F15h M60h
  pci_ids: Add PCI device IDs for F15h M60h
2014-12-08 20:17:49 -08:00
Tony Luck
fec53af531 sb_edac: Fix typo computing number of banks
Code will always think there are 16 banks because of a typo

Reported-by: Misha
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 16:46:33 -02:00
Tony Luck
1f39581a9a sb_edac: Add support for Broadwell-DE processor
Broadwell-DE is the microserver version of next generation Xeon
processors.  A whole bunch of new PCIe device ids, but otherwise
pretty much the same as Haswell.

Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 16:46:08 -02:00
Tony Luck
f7cf2a22a2 sb_edac: Fix discovery of top-of-low-memory for Haswell
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset.  There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

  EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523eac86
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
   MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

   DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523e1086
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
   DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
   DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
   DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
   DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
   DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
   MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:52 -02:00
Jim Snow
8c00910029 sb_edac: Fix erroneous bytes->gigabytes conversion
Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Jim Snow
50043e257c sb_edac: Fix off-by-one error in number of channels
This prevented edac sysfs code from properly handling 6 channels
per memory controller.

Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Borislav Petkov
50872ccd87 EDAC, MCE, AMD: Correct formatting of decoded text
Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.

Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:49 +01:00
Borislav Petkov
51756a5034 EDAC, mce_amd_inj: Add an injector function
Selectively inject either a real MCE or a sw-only version which
exercises the decoding path only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:45 +01:00
Borislav Petkov
b18f3864bf EDAC, mce_amd_inj: Add hw-injection attributes
Expose struct mce->inject_flags.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:42 +01:00
Borislav Petkov
21690934d9 EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
Normally, writing those causes a #GP but HWCR[McStatusWrEn] controls
that. Provide a knob.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:37 +01:00
Borislav Petkov
fd19fcd632 EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
This module's interface belongs in debugfs, not in sysfs.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-25 13:09:33 +01:00
Chen Yucong
e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Markus Elfring
0a98babd85 EDAC: Delete unnecessary check before calling pci_dev_put()
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test before the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net
[ Boris: commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-19 16:33:48 +01:00
Andreas Ruprecht
19ca5a3cc4 EDAC, pci_sysfs: remove unneccessary ifdef around entire file
The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already
modelled in the Makefile, but edac_pci_sysfs.o is still contained in
the list of files compiled even without CONFIG_PCI.

This change removes edac_pci_sysfs.o from the list of built objects
when not having CONFIG_PCI enabled and removes the then-unnecessary
ifdef from the source file.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-11 18:17:57 +01:00
Dan Carpenter
665aa8cdc4 ghes_edac: Use snprintf() to silence a static checker warning
My static checker complains because the "e->location" has up to 256
characters but we are copying it into the "pvt->detail_location" which
only has space for 240 characters.  That's not counting the surrounding
text and the "e->other_detail" string which can be over 80 characters
long.

I am not familiar with this code but presumably it normally works.
Let's add a limit though for safety.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/20140801082514.GD28869@mwanda
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-11 18:08:56 +01:00
Tomasz Pala
f5b10c45ef amd64_edac: Build module on x86-32
By popular demand, enable amd64_edac on 32-bit too.

Boris:
 - update Kconfig text.
 - add a warning on load which states that 32-bit configurations are unsupported.

Signed-off-by: Tomasz Pala <gotar@polanet.pl>
Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-05 15:54:34 +01:00
Aravind Gopalakrishnan
bc4febe93c EDAC, MCE, AMD: Add decoding table for MC6 xec
Extended error code meanings are tabulated for other banks. Extend that
tradition for MC6 too.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-04 18:49:20 +01:00
Greg Kroah-Hartman
a8a93c6f99 Merge branch 'platform/remove_owner' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into driver-core-next
Remove all .owner fields from platform drivers
2014-11-03 19:53:56 -08:00
Aravind Gopalakrishnan
a597d2a5d9 amd64_edac: Add F15h M60h support
This patch adds support for ECC error decoding for F15h M60h processor.
Aside from the usual changes, the patch adds support for some new features
in the processor:
 - DDR4(unbuffered, registered); LRDIMM DDR3 support
   - relevant debug messages have been modified/added to report these
     memory types
 - new dbam_to_cs mappers
   - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
     cs_size. This multiplier value is obtained from the per-dimm
     DCSM register. So, change the interface to accept a 'cs_mask_nr'
     value to facilitate this calculation
 - switch-casing determine_memory_type()
   - done to cleanse the function of too many if-else statements
     and improve readability
   - This is now called early in read_mc_regs() to cache dram_type

Misc cleanup:
 - amd64_pci_table[] is condensed by using PCI_VDEVICE macro.

Testing details:
Tested the patch by injecting 'ECC' type errors using mce_amd_inj
and error decoding works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: determine_memory_type() cleanups ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-30 13:42:48 +01:00
Jason Baron
8030122a9c e7xxx_edac: Report CE events properly
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:59:00 +02:00
Jason Baron
fa19ac4b92 cpc925_edac: Report UE events properly
Fix UE event being reported as HW_EVENT_ERR_CORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:58:45 +02:00
Jason Baron
ab0543de6f i82860_edac: Report CE events properly
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-22 22:58:31 +02:00