Andreas Pflug
2016-Jan-22 09:09 UTC
[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
Am 21.01.16 um 17:41 schrieb Jan Beulich:>>>> On 20.01.16 at 16:01, <andreas.pflug at web.de> wrote: >> Initially reported to debian >> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here: >> >> With AMD Opteron 6xxx processors, half of the memory controllers are >> missing from /sys/devices/system/edac/mc >> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual >> MC), other dual-module CPUs might be affected too. >> >> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are >> listed under /sys/devices/system/edac/mc as expected. Same happens, when >> Xen 4.1 is used: all MCs present. >> >> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU >> machine) or mc2/mc3 (dual CPU machine) are present, although the full >> system memory is accessible. Checked versions were 4.1.4 (Debian >> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid) > As already indicated by Ian in that bug, you should supply us with > full kernel and hypervisor logs for both the good and bad cases > (ideally with the same kernel version use in both runs, so that we > can exclude kernel behavior differences).Here are some dmesg excerpts, all performed with Linux 4.1.3. When booting with Xen 4.1.4: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) When booting with Xen 4.4.1: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) Apparently Xen4.4 doesn't report the BIOS flag correctly. I added ecc_enable_override=1 to amd64_edac_mod, and then I get EDAC MC: Ver: 3.0.0 AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. EDAC amd64: Forcing ECC on! EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) This restored both MCs, so the BIOS flag seems to be the culprit. Regards, Andreas
Jan Beulich
2016-Jan-22 10:40 UTC
[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
>>> On 22.01.16 at 10:09, <pgadmin at pse-consulting.de> wrote: > When booting with Xen 4.4.1: > > AMD64 EDAC driver v3.4.0 > EDAC amd64: DRAM ECC enabled. > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.I wonder how valid his message is. We actually write this MSR with all ones during boot. However, considering involved functions like nb_mce_bank_enabled_on_node() or node_to_amd_nb() taking node IDs as inputs, and considering that PV guests (including Dom0) don't have a topology matching that of the host, I doubt very much that this driver is even remotely prepared to run under Xen. It working on Xen 4.1.x would then be by pure accident. Jan
Andreas Pflug
2016-Jan-22 11:33 UTC
[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
Am 22.01.16 um 11:40 schrieb Jan Beulich:>>>> On 22.01.16 at 10:09, <pgadmin at pse-consulting.de> wrote: >> When booting with Xen 4.4.1: >> >> AMD64 EDAC driver v3.4.0 >> EDAC amd64: DRAM ECC enabled. >> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. > I wonder how valid his message is. We actually write this MSR with > all ones during boot. > > However, considering involved functions like > nb_mce_bank_enabled_on_node() or node_to_amd_nb() taking > node IDs as inputs, and considering that PV guests (including > Dom0) don't have a topology matching that of the host, I doubt > very much that this driver is even remotely prepared to run > under Xen. It working on Xen 4.1.x would then be by pure > accident.The dmesg is identical with or without Xen4.1, so I'd guess it does work if flags are detected correctly. Regards Andreas
Reasonably Related Threads
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing