linux/drivers/edac
Qiuxu Zhuo 15cc3ae001 EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1f4 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-27 12:15:43 +02:00
..
altera_edac.c EDAC, altera: Fix error handling path in altr_edac_device_probe() 2017-08-18 18:21:27 +02:00
altera_edac.h EDAC, altera: Rename device trigger to common name 2016-09-01 08:55:24 +02:00
amd64_edac_dbg.c
amd64_edac_inj.c
amd64_edac.c EDAC: Add owner check to the x86 platform drivers 2017-09-25 13:09:39 +02:00
amd64_edac.h EDAC, amd64: Bump driver version 2017-02-14 11:58:05 +01:00
amd76x_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
amd8111_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
amd8111_edac.h
amd8131_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
amd8131_edac.h
cell_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
cpc925_edac.c EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_name 2017-07-19 07:42:41 +02:00
debugfs.c
e7xxx_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
e752x_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
edac_device_sysfs.c edac: move EDAC device definitions to drivers/edac/edac_device.h 2016-12-15 08:54:51 -02:00
edac_device.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
edac_device.h edac: move documentation from edac_device to edac_core.h 2016-12-15 08:54:51 -02:00
edac_mc_sysfs.c EDAC: Make device_type const 2017-08-20 13:12:32 +02:00
edac_mc.c EDAC: Add helper which returns the loaded platform driver 2017-09-25 12:55:59 +02:00
edac_mc.h EDAC: Add helper which returns the loaded platform driver 2017-09-25 12:55:59 +02:00
edac_module.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
edac_module.h edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
edac_pci_sysfs.c edac: move documentation from edac_pci*.c to edac_pci.h 2016-12-15 08:54:51 -02:00
edac_pci.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
edac_pci.h edac: move documentation from edac_pci*.c to edac_pci.h 2016-12-15 08:54:51 -02:00
fsl_ddr_edac.c EDAC, fsl_ddr: Make locally used symbols static 2017-02-09 17:40:54 +01:00
fsl_ddr_edac.h EDAC, fsl_ddr: Add missing DDR DRAM types 2016-09-01 10:28:01 +02:00
ghes_edac.c EDAC, ghes: Add platform check 2017-09-25 12:55:12 +02:00
highbank_l2_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
highbank_mc_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i7core_edac.c EDAC: Handle return value of kasprintf() 2017-09-21 12:18:44 +02:00
i3000_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i3200_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i5000_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i5100_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i5400_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i7300_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i82443bxgx_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i82860_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i82875p_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
i82975x_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
ie31200_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
Kconfig EDAC, ghes: Do not enable it by default 2017-04-27 14:15:38 +02:00
layerscape_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
Makefile EDAC: Delete edac_stub.c 2017-04-10 17:14:48 +02:00
mce_amd.c EDAC, mce_amd: Get rid of local var in amd_filter_mce() 2017-08-21 17:59:38 +02:00
mce_amd.h EDAC/mce/amd: Unexport amd_decode_mce() 2017-01-24 09:14:55 +01:00
mpc85xx_edac.c EDAC, mpc85xx: Add T2080 l2-cache support 2017-02-03 10:36:35 +01:00
mpc85xx_edac.h EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx 2016-09-01 10:28:00 +02:00
mv64x60_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
mv64x60_edac.h
octeon_edac-l2c.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
octeon_edac-lmc.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
octeon_edac-pc.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
octeon_edac-pci.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
pasemi_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
pnd2_edac.c EDAC: Add owner check to the x86 platform drivers 2017-09-25 13:09:39 +02:00
pnd2_edac.h EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms 2017-03-16 12:40:52 +01:00
ppc4xx_edac.c EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_name 2017-07-19 07:42:41 +02:00
ppc4xx_edac.h
r82600_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
sb_edac.c EDAC, sb_edac: Don't create a second memory controller if HA1 is not present 2017-09-27 12:15:43 +02:00
skx_edac.c EDAC: Add owner check to the x86 platform drivers 2017-09-25 13:09:39 +02:00
synopsys_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
thunderx_edac.c EDAC, thunderx: Fix error handling path in thunderx_lmc_probe() 2017-08-18 19:12:08 +02:00
tile_edac.c edac: rename edac_core.h to edac_mc.h 2016-12-15 08:54:51 -02:00
wq.c EDAC, wq: Remove deprecated create_singlethread_workqueue() 2016-08-15 07:21:29 +02:00
x38_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00
xgene_edac.c EDAC: Get rid of mci->mod_ver 2017-07-17 13:42:48 +02:00