Andreas Pflug
2016-Jan-20 15:01 UTC
[Pkg-xen-devel] Bug#810964: [BUG] EDAC infomation partially missing
Initially reported to debian (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here: With AMD Opteron 6xxx processors, half of the memory controllers are missing from /sys/devices/system/edac/mc Checked with single 6120 (dual memory controller) and twin 6344 (2x dual MC), other dual-module CPUs might be affected too. Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are listed under /sys/devices/system/edac/mc as expected. Same happens, when Xen 4.1 is used: all MCs present. Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU machine) or mc2/mc3 (dual CPU machine) are present, although the full system memory is accessible. Checked versions were 4.1.4 (Debian Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)
Jan Beulich
2016-Jan-21 16:41 UTC
[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
>>> On 20.01.16 at 16:01, <andreas.pflug at web.de> wrote: > Initially reported to debian > (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here: > > With AMD Opteron 6xxx processors, half of the memory controllers are > missing from /sys/devices/system/edac/mc > Checked with single 6120 (dual memory controller) and twin 6344 (2x dual > MC), other dual-module CPUs might be affected too. > > Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are > listed under /sys/devices/system/edac/mc as expected. Same happens, when > Xen 4.1 is used: all MCs present. > > Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU > machine) or mc2/mc3 (dual CPU machine) are present, although the full > system memory is accessible. Checked versions were 4.1.4 (Debian > Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)As already indicated by Ian in that bug, you should supply us with full kernel and hypervisor logs for both the good and bad cases (ideally with the same kernel version use in both runs, so that we can exclude kernel behavior differences). Jan
Andreas Pflug
2016-Jan-22 09:09 UTC
[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
Am 21.01.16 um 17:41 schrieb Jan Beulich:>>>> On 20.01.16 at 16:01, <andreas.pflug at web.de> wrote: >> Initially reported to debian >> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here: >> >> With AMD Opteron 6xxx processors, half of the memory controllers are >> missing from /sys/devices/system/edac/mc >> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual >> MC), other dual-module CPUs might be affected too. >> >> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are >> listed under /sys/devices/system/edac/mc as expected. Same happens, when >> Xen 4.1 is used: all MCs present. >> >> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU >> machine) or mc2/mc3 (dual CPU machine) are present, although the full >> system memory is accessible. Checked versions were 4.1.4 (Debian >> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid) > As already indicated by Ian in that bug, you should supply us with > full kernel and hypervisor logs for both the good and bad cases > (ideally with the same kernel version use in both runs, so that we > can exclude kernel behavior differences).Here are some dmesg excerpts, all performed with Linux 4.1.3. When booting with Xen 4.1.4: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) When booting with Xen 4.4.1: AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) Apparently Xen4.4 doesn't report the BIOS flag correctly. I added ecc_enable_override=1 to amd64_edac_mod, and then I get EDAC MC: Ver: 3.0.0 AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable. EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. EDAC amd64: Forcing ECC on! EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT) EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 1). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 2048MB 3: 2048MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x8 syndromes. EDAC amd64: MCT channel count: 2 EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV 0000:00:19.2 (INTERRUPT) This restored both MCs, so the BIOS flag seems to be the culprit. Regards, Andreas
Possibly Parallel Threads
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: only partial EDAC information with Xen
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing