Commit Graph

762 Commits

Author SHA1 Message Date
H. Peter Anvin
bbd771474e Merge branch 'x86/trampoline' into x86/urgent
x86/trampoline contains an urgent commit which is necessarily on a
newer baseline.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30 12:11:32 -07:00
Ingo Molnar
403e1c5b74 Merge branch 'x86/mce' into x86/urgent
Merge in these fixlets.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30 14:12:06 +02:00
Linus Torvalds
87a5af24e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC internal API changes from Mauro Carvalho Chehab:
 "This changeset is the first part of a series of patches that fixes the
  EDAC sybsystem.  On this set, it changes the Kernel EDAC API in order
  to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and
  Intel E5-xxxx memory controllers.

  The EDAC core used to assume that:

       - the DRAM chip select pin is directly accessed by the memory
         controller

       - when multiple channels are used, they're all filled with the
         same type of memory.

  None of the above premises is true on Intel memory controllers since
  2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory
  Buffer or by some similar technologies hides the direct access to the
  DRAM pins.

  So, the existing drivers for those chipsets had to lie to the EDAC
  core, in general telling that just one channel is filled.  That
  produces some hard to understand error messages like:

       EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1

  The location information there (row3 channel 0) is completely bogus:
  it has no physical meaning, and are just some random values that the
  driver uses to talk with the EDAC core.  The error actually happened
  at CPU socket 0, channel 0, slot 1, but this is not reported anywhere,
  as the EDAC core doesn't know anything about the memory layout.  So,
  only advanced users that know how the EDAC driver works and that tests
  their systems to see how DIMMs are mapped can actually benefit for
  such error logs.

  This patch series fixes the error report logic, in order to allow the
  EDAC to expose the memory architecture used by them to the EDAC core.
  So, as the EDAC core now understands how the memory is organized, it
  can provide an useful report:

       EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4)

  The location of the DIMM where the error happened is reported by "MC0"
  (cpu socket #0), at "channel:0 slot:1" location, and matches the
  physical location of the DIMM.

  There are two remaining issues not covered by this patch series:

       - The EDAC sysfs API will still report bogus values.  So,
         userspace tools like edac-utils will still use the bogus data;

       - Add a new tracepoint-based way to get the binary information
         about the errors.

  Those are on a second series of patches (also at -next), but will
  probably miss the train for 3.5, due to the slow review process."

Fix up trivial conflict (due to spelling correction of removed code) in
drivers/edac/edac_device.c

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits)
  i7core: fix ranks information at the per-channel struct
  i5000: Fix the fatal error handling
  i5100_edac: Fix a warning when compiled with 32 bits
  i82975x_edac: Test nr_pages earlier to save a few CPU cycles
  e752x_edac: provide more info about how DIMMS/ranks are mapped
  i5000_edac: Fix the logic that retrieves memory information
  i5400_edac: improve debug messages to better represent the filled memory
  edac: Cleanup the logs for i7core and sb edac drivers
  edac: Initialize the dimm label with the known information
  edac: Remove the legacy EDAC ABI
  x38_edac: convert driver to use the new edac ABI
  tile_edac: convert driver to use the new edac ABI
  sb_edac: convert driver to use the new edac ABI
  r82600_edac: convert driver to use the new edac ABI
  ppc4xx_edac: convert driver to use the new edac ABI
  pasemi_edac: convert driver to use the new edac ABI
  mv64x60_edac: convert driver to use the new edac ABI
  mpc85xx_edac: convert driver to use the new edac ABI
  i82975x_edac: convert driver to use the new edac ABI
  i82875p_edac: convert driver to use the new edac ABI
  ...
2012-05-29 18:32:37 -07:00
Mauro Carvalho Chehab
0bf09e829d i7core: fix ranks information at the per-channel struct
There is a flag at the per-channel struct that indicates if there are
any 4R dimm on it. The way the presence of this flag were reported
is not ok, as it might give the false idea that the channel were filled
with 2R memories:

[  580.588701] EDAC DEBUG: get_dimm_config: Ch1 phy rd1, wr1 (0x063f7431): 2 ranks, UDIMMs
[  580.588704] EDAC DEBUG: get_dimm_config: 	dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, just one 1R memory is filled on channel 1)

So, use a better way to represent the per-channel ranks information.
After the patch, it will show:

[ 2002.233978] EDAC DEBUG: get_dimm_config: Ch0 phy rd0, wr0 (0x063f7431): UDIMMs
[ 2002.233982] EDAC DEBUG: get_dimm_config: 	dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400
[ 2002.233988] EDAC DEBUG: get_dimm_config: 	dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, there isn't any 4R memories)

Reported-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:55 -03:00
Mauro Carvalho Chehab
486dfb1638 i5000: Fix the fatal error handling
The fatal error channel bits point to a single channel, and not
to a range of channels. Fix the code to properly report it,
instead of printing messages like:
	kernel: EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:54 -03:00
Mauro Carvalho Chehab
9f70d08a4c i5100_edac: Fix a warning when compiled with 32 bits
drivers/edac/i5100_edac.c: In function ‘i5100_init_csrows’:
drivers/edac/i5100_edac.c:862:3: warning: format ‘%zd’ expects argument of type ‘signed size_t’, but argument 5 has type ‘long unsigned int’ [-Wformat]

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:54 -03:00
Mauro Carvalho Chehab
36683aab90 i82975x_edac: Test nr_pages earlier to save a few CPU cycles
Avoid test nr_pages twice, and initializing some data that won't
be used.

Cleanup patch only.

Reported-by: Aristeu Rozanski Filho <arozansk@redhat.com>
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:53 -03:00
Mauro Carvalho Chehab
805afb6997 e752x_edac: provide more info about how DIMMS/ranks are mapped
No funtional changes here. Only the comments got updated.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:53 -03:00
Mauro Carvalho Chehab
64e1fdaf55 i5000_edac: Fix the logic that retrieves memory information
The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.

The check if the AMB is present were also wrong.

Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.

After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:

calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot  3       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: slot  2       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot  1       0 MB   |    0 MB   |    0 MB   |    0 MB   |
calculate_dimm_size: slot  0     512 MB 1R|  512 MB 1R|  512 MB 1R|  512 MB 1R|
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size:            channel 0 | channel 1 | channel 2 | channel 3 |
calculate_dimm_size:                   branch 0       |        branch 1       |

(1R above means that all memories on my test machine are single-ranked)

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:52 -03:00
Mauro Carvalho Chehab
68d086f89b i5400_edac: improve debug messages to better represent the filled memory
Improves the debug output message, in order to better represent the
memory controller hierarchy, when outputing the debug messages.

No functional changes when debug is disabled.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab
e17a2f42a4 edac: Cleanup the logs for i7core and sb edac drivers
Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab
5926ff502f edac: Initialize the dimm label with the known information
While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.

For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:

Memory Device
	Array Handle: 0x0029
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 2048 MB
	Form Factor: DIMM
	Set: 1
	Locator: A1_DIMM0
	Bank Locator: A1_Node0_Channel0_Dimm0
	Type: <OUT OF SPEC>
	Type Detail: Synchronous
	Speed: 800 MHz
	Manufacturer: A1_Manufacturer0
	Serial Number: A1_SerNum0
	Asset Tag: A1_AssetTagNum0
	Part Number: A1_PartNum0

The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.

After this patch, the memory label will be filled with:
	/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

And (after the new EDAC API patches) as:
	/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.

It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab
ca0907b9e4 edac: Remove the legacy EDAC ABI
Now that all drivers got converted to use the new ABI, we can
drop the old one.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab
e2acc357ee x38_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:49 -03:00
Mauro Carvalho Chehab
40467db770 tile_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab
c36e3e7768 sb_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab
63b5d1d9aa r82600_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:47 -03:00
Mauro Carvalho Chehab
94d9337459 ppc4xx_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:46 -03:00
Mauro Carvalho Chehab
f34575aca9 pasemi_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:46 -03:00
Mauro Carvalho Chehab
a583ac6ca8 mv64x60_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:45 -03:00
Mauro Carvalho Chehab
ad4d6e2311 mpc85xx_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:45 -03:00
Mauro Carvalho Chehab
705213580b i82975x_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:44 -03:00
Mauro Carvalho Chehab
0a8a9ac9ca i82875p_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:43 -03:00
Mauro Carvalho Chehab
84c3a68408 i82860_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:43 -03:00
Mauro Carvalho Chehab
40f562b191 i82443bxgx_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:42 -03:00