Anderson, Dave
2017-Apr-14 10:16 UTC
[CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
List moderator: feel free to delete my previous large message with attachments that's in the moderation queue...it's now obsolete anyway. I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13: Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace: [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 installing Xen timer for CPU 8 [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/cpu/common.c:997! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 random: fast init done task: ffff880058a8c4c0 task.stack: ffffc900400b4000 RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88005d800000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 Stack: 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 [<ffffffff81029925>] cpu_bringup+0x35/0x90 [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 RSP <ffffc900400b7f08> ---[ end trace dc5563100443876e ]--- I surmised that reducing the number of dom0 vcpu might solve this issue (they were unbounded) In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running grub2-mkconfig has resulted in the system I have that never booted Xen 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests. So...I don't know if there's a race condition somewhere, or what...but...so far this workaround has not failed me. Thanks, -Dave> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com >> wrote: >> I've not gotten any bites from my posting on the xen-devel mailing list. >> Here is the only one to-date: >> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html >> >> From that email, there needs to be some hypervisor messages. >> >> Does anyone know how to produce the hypervisor messages? I've already > >> removed the rhgb and quiet options from the boot. > >> >> Thanks >> PJ > > > I spoke too soon. To get more information: Please see > > https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project > > and > > https://wiki.xenproject.org/wiki/Xen_Serial_Console > > or alternatively at least add "vga=keep". > > pjwelsh
Johnny Hughes
2017-Apr-14 12:39 UTC
[CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
Dave, Take a look at this kernel as it is the one I think we are going to release (or a slightly newer 4.9.2x from kernel.org LTS). This version has some newer settings that are more redhat/fedora/centos base kernel like WRT what is a module and what is built into the kernel, etc. https://people.centos.org/hughesjr/4.9.x/ Thanks, Johnny Hughes On 04/14/2017 05:16 AM, Anderson, Dave wrote:> List moderator: feel free to delete my previous large message with attachments that's in the moderation queue...it's now obsolete anyway. > > > I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13: > > Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace: > > [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 > installing Xen timer for CPU 8 > [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 > smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. > ------------[ cut here ]------------ > kernel BUG at arch/x86/kernel/cpu/common.c:997! > invalid opcode: 0000 [#1] SMP > Modules linked in: > CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 > Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 > random: fast init done > task: ffff880058a8c4c0 task.stack: ffffc900400b4000 > RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 > RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 > RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 > RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 > RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 > R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88005d800000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 > Stack: > 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e > ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 > ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 > [<ffffffff81029925>] cpu_bringup+0x35/0x90 > [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 > Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 > RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 > RSP <ffffc900400b7f08> > ---[ end trace dc5563100443876e ]--- > > I surmised that reducing the number of dom0 vcpu might solve this issue (they were unbounded) > > In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running grub2-mkconfig has resulted in the system I have that never booted Xen 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests. > > > So...I don't know if there's a race condition somewhere, or what...but...so far this workaround has not failed me. > > Thanks, > -Dave > > > >> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com >>> wrote: >>> I've not gotten any bites from my posting on the xen-devel mailing list. >>> Here is the only one to-date: >>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html >>> >>> From that email, there needs to be some hypervisor messages. >>> >>> Does anyone know how to produce the hypervisor messages? I've already >> >>> removed the rhgb and quiet options from the boot. >> >>> >>> Thanks >>> PJ >> >> >> I spoke too soon. To get more information: Please see >> >> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project >> >> and >> >> https://wiki.xenproject.org/wiki/Xen_Serial_Console >> >> or alternatively at least add "vga=keep". >> >> pjwelsh > > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170414/359b81ce/attachment-0002.sig>
PJ Welsh
2017-Apr-14 14:33 UTC
[CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
I am on holiday until Sunday, but will download the kernel now and test it when I get back into work. Thanks On Fri, Apr 14, 2017 at 7:39 AM, Johnny Hughes <johnny at centos.org> wrote:> Dave, > > Take a look at this kernel as it is the one I think we are going to > release (or a slightly newer 4.9.2x from kernel.org LTS). This version > has some newer settings that are more redhat/fedora/centos base kernel > like WRT what is a module and what is built into the kernel, etc. > > https://people.centos.org/hughesjr/4.9.x/ > > Thanks, > Johnny Hughes > > On 04/14/2017 05:16 AM, Anderson, Dave wrote: > > List moderator: feel free to delete my previous large message with > attachments that's in the moderation queue...it's now obsolete anyway. > > > > > > I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + > Kernel 4.9.13: > > > > Once I finally got serial output all the way through the boot process > (xen+dom0) I discovered the stack trace: > > > > [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 > > installing Xen timer for CPU 8 > > [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 > > smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. > > ------------[ cut here ]------------ > > kernel BUG at arch/x86/kernel/cpu/common.c:997! > > invalid opcode: 0000 [#1] SMP > > Modules linked in: > > CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 > > Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 > > random: fast init done > > task: ffff880058a8c4c0 task.stack: ffffc900400b4000 > > RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] > identify_secondary_cpu+0x57/0x80 > > RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 > > RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 > > RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 > > RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 > > R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > FS: 0000000000000000(0000) GS:ffff88005d800000(0000) > knlGS:0000000000000000 > > CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 > > Stack: > > 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e > > ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 > > ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > > [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 > > [<ffffffff81029925>] cpu_bringup+0x35/0x90 > > [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 > > Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 > 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f > b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 > > RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 > > RSP <ffffc900400b7f08> > > ---[ end trace dc5563100443876e ]--- > > > > I surmised that reducing the number of dom0 vcpu might solve this issue > (they were unbounded) > > > > In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the > GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running > grub2-mkconfig has resulted in the system I have that never booted Xen > 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests. > > > > > > So...I don't know if there's a race condition somewhere, or > what...but...so far this workaround has not failed me. > > > > Thanks, > > -Dave > > > > > > > >> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com > >>> wrote: > >>> I've not gotten any bites from my posting on the xen-devel mailing > list. > >>> Here is the only one to-date: > >>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html > >>> > >>> From that email, there needs to be some hypervisor messages. > >>> > >>> Does anyone know how to produce the hypervisor messages? I've already > >> > >>> removed the rhgb and quiet options from the boot. > >> > >>> > >>> Thanks > >>> PJ > >> > >> > >> I spoke too soon. To get more information: Please see > >> > >> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project > >> > >> and > >> > >> https://wiki.xenproject.org/wiki/Xen_Serial_Console > >> > >> or alternatively at least add "vga=keep". > >> > >> pjwelsh > > > > > > _______________________________________________ > > CentOS-virt mailing list > > CentOS-virt at centos.org > > https://lists.centos.org/mailman/listinfo/centos-virt > > > > > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170414/661ba975/attachment-0002.html>
PJ Welsh
2017-Apr-14 14:34 UTC
[CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
Very nice on the sleuthing! Thanks On Fri, Apr 14, 2017 at 5:16 AM, Anderson, Dave <daveanderson at wsu.edu> wrote:> List moderator: feel free to delete my previous large message with > attachments that's in the moderation queue...it's now obsolete anyway. > > > I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + > Kernel 4.9.13: > > Once I finally got serial output all the way through the boot process > (xen+dom0) I discovered the stack trace: > > [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 > installing Xen timer for CPU 8 > [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 > smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. > ------------[ cut here ]------------ > kernel BUG at arch/x86/kernel/cpu/common.c:997! > invalid opcode: 0000 [#1] SMP > Modules linked in: > CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 > Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 > random: fast init done > task: ffff880058a8c4c0 task.stack: ffffc900400b4000 > RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] > identify_secondary_cpu+0x57/0x80 > RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 > RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 > RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 > RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 > R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88005d800000(0000) > knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 > Stack: > 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e > ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 > ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 > [<ffffffff81029925>] cpu_bringup+0x35/0x90 > [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 > Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 > 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 > 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 > RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 > RSP <ffffc900400b7f08> > ---[ end trace dc5563100443876e ]--- > > I surmised that reducing the number of dom0 vcpu might solve this issue > (they were unbounded) > > In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the > GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running > grub2-mkconfig has resulted in the system I have that never booted Xen > 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests. > > > So...I don't know if there's a race condition somewhere, or > what...but...so far this workaround has not failed me. > > Thanks, > -Dave > > > > > On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com > >> wrote: > >> I've not gotten any bites from my posting on the xen-devel mailing list. > >> Here is the only one to-date: > >> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html > >> > >> From that email, there needs to be some hypervisor messages. > >> > >> Does anyone know how to produce the hypervisor messages? I've already > > > >> removed the rhgb and quiet options from the boot. > > > >> > >> Thanks > >> PJ > > > > > > I spoke too soon. To get more information: Please see > > > > https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project > > > > and > > > > https://wiki.xenproject.org/wiki/Xen_Serial_Console > > > > or alternatively at least add "vga=keep". > > > > pjwelsh > > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170414/1f058d4e/attachment-0002.html>
Anderson, Dave
2017-Apr-14 20:26 UTC
[CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
Sad to say that I already tested 4.9.20-26 from your repo yesterday...it does look a little cleaner before it dies, but still dies. I have not tested it with the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits below: Loading Xen 4.6.3-12.el7 ... Loading Linux 4.9.20-26.el7.x86_64 ... Loading initial ramdisk ... [ 0.000000] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017 <snip> [ 6.195089] smpboot: Max logical packages: 1 [ 6.199549] VPMU disabled by hypervisor. [ 6.203663] Performance Events: SandyBridge events, PMU not available due to virtualization, using software events only. [ 6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled [ 6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus [ 6.229165] installing Xen timer for CPU 1 [ 6.233849] installing Xen timer for CPU 2 [ 6.238504] installing Xen timer for CPU 3 [ 6.243139] installing Xen timer for CPU 4 [ 6.247836] installing Xen timer for CPU 5 [ 6.252478] installing Xen timer for CPU 6 [ 6.257155] installing Xen timer for CPU 7 [ 6.261795] installing Xen timer for CPU 8 [ 6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. [ 6.272736] ------------[ cut here ]------------ [ 6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997! [ 6.280104] random: fast init done [ 6.286333] invalid opcode: 0000 [#1] SMP [ 6.290343] Modules linked in: [ 6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.20-26.el7.x86_64 #1 [ 6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 [ 6.307183] task: ffff880058a68000 task.stack: ffffc900400c0000 [ 6.313103] RIP: e030:[<ffffffff8103e7e7>] [<ffffffff8103e7e7>] identify_secondary_cpu+0x57/0x80 [ 6.322019] RSP: e02b:ffffc900400c3f08 EFLAGS: 00010086 [ 6.327333] RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81e5ffc8 [ 6.334473] RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 [ 6.341607] RBP: ffffc900400c3f18 R08: 00000000000000ce R09: 0000000000000000 [ 6.348738] R10: 0000000000000005 R11: 0000000000000006 R12: 0000000000000008 [ 6.355873] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 6.363006] FS: 0000000000000000(0000) GS:ffff88005d800000(0000) knlGS:0000000000000000 [ 6.371090] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [ 6.376837] CR2: 0000000000000000 CR3: 0000000001e07000 CR4: 0000000000042660 [ 6.383970] Stack: [ 6.386004] 0000000000000008 0000000000000000 ffffc900400c3f28 ffffffff8104ebce [ 6.393483] ffffc900400c3f40 ffffffff81029855 0000000000000000 ffffc900400c3f50 [ 6.400963] ffffffff810298d0 0000000000000000 0000000000000000 0000000000000000 [ 6.408450] Call Trace: [ 6.410907] [<ffffffff8104ebce>] smp_store_cpu_info+0x3e/0x40 [ 6.416753] [<ffffffff81029855>] cpu_bringup+0x35/0x90 [ 6.421981] [<ffffffff810298d0>] cpu_bringup_and_idle+0x20/0x40 [ 6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81 [ 6.448249] RIP [<ffffffff8103e7e7>] identify_secondary_cpu+0x57/0x80 [ 6.454801] RSP <ffffc900400c3f08> [ 6.458305] ---[ end trace 2f9b62c5c7050204 ]--- So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 0 APIC: 1" lines, but otherwise dies the same way. I included a few extra lines up from the panic because the "[ 6.195089] smpboot: Max logical packages: 1" could possibly be relevant, I need to go look at a clean boot to see if that was in there on this machine. Even more strangely, in addition to the machine I'm talking about which panics and reboots, I had a second nearly identical machine (different CPU/ram config, everything else the same) which booted but had some kind of hw conflict with 4.9.x that I never had before. It appears to be between Intel SCU and an intel PCIe NVMe SSD (luckily I wasn't using SCU, so I disabled that). Had that other machine not booted I would have just assumed 4.9.X was totally broken and sat on 3.18...so I'm glad that one machine booted at least :) Thanks, -Dave> On Apr 14, 2017, at 05:39, Johnny Hughes <johnny at centos.org> wrote: > > Dave, > > Take a look at this kernel as it is the one I think we are going to > release (or a slightly newer 4.9.2x from kernel.org LTS). This version > has some newer settings that are more redhat/fedora/centos base kernel > like WRT what is a module and what is built into the kernel, etc. > > https://people.centos.org/hughesjr/4.9.x/ > > Thanks, > Johnny Hughes > > On 04/14/2017 05:16 AM, Anderson, Dave wrote: >> List moderator: feel free to delete my previous large message with attachments that's in the moderation queue...it's now obsolete anyway. >> >> >> I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13: >> >> Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace: >> >> [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 >> installing Xen timer for CPU 8 >> [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 >> smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. >> ------------[ cut here ]------------ >> kernel BUG at arch/x86/kernel/cpu/common.c:997! >> invalid opcode: 0000 [#1] SMP >> Modules linked in: >> CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 >> Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 >> random: fast init done >> task: ffff880058a8c4c0 task.stack: ffffc900400b4000 >> RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 >> RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 >> RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 >> RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 >> RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 >> R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 >> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 >> FS: 0000000000000000(0000) GS:ffff88005d800000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 >> CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 >> Stack: >> 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e >> ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 >> ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 >> Call Trace: >> [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 >> [<ffffffff81029925>] cpu_bringup+0x35/0x90 >> [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 >> Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 >> RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 >> RSP <ffffc900400b7f08> >> ---[ end trace dc5563100443876e ]--- >> >> I surmised that reducing the number of dom0 vcpu might solve this issue (they were unbounded) >> >> In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running grub2-mkconfig has resulted in the system I have that never booted Xen 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests. >> >> >> So...I don't know if there's a race condition somewhere, or what...but...so far this workaround has not failed me. >> >> Thanks, >> -Dave >> >> >> >>> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com >>>> wrote: >>>> I've not gotten any bites from my posting on the xen-devel mailing list. >>>> Here is the only one to-date: >>>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html >>>> >>>> From that email, there needs to be some hypervisor messages. >>>> >>>> Does anyone know how to produce the hypervisor messages? I've already >>> >>>> removed the rhgb and quiet options from the boot. >>> >>>> >>>> Thanks >>>> PJ >>> >>> >>> I spoke too soon. To get more information: Please see >>> >>> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project >>> >>> and >>> >>> https://wiki.xenproject.org/wiki/Xen_Serial_Console >>> >>> or alternatively at least add "vga=keep". >>> >>> pjwelsh >> >> >> _______________________________________________ >> CentOS-virt mailing list >> CentOS-virt at centos.org >> https://lists.centos.org/mailman/listinfo/centos-virt >> > > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt
Possibly Parallel Threads
- Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
- Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
- Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
- Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.
- Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.