George Dunlap
2010-Feb-24 18:47 UTC
[Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
I recenty tried to boot my Debian lenny distro on a newer Intel Nehalem box using Debian Lenny''s default xen kernel (2.6.26-2-xen-686), and it crashed during boot. Full log attached, but the key info is here: I tried doing a "binary search" to see where this was introduced, but it fails the same way all the way back to c/s 20000. The same kernel/hypervisor combination boots fine on a different box I have. -George (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 15 (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: ffff830828b0e690 (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 (XEN) rbp: ffff83082cd7fec8 rsp: ffff83082cd7fd88 r8: 0000000000000005 (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: 0000000000000040 (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83082cd7fd88: (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 0000000100000000 (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 0000000000000141 (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 ffff83082cd60000 (XEN) 0000000000000038 00000000ffffffed 0000000000000000 ffff82c480154832 (XEN) 00000000000004d8 0000000000000038 00000000000000e0 ffff83083fd81c80 (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 ffff83082cd60000 (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 0000000000000038 (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f ffff82c48024c080 (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a ffff82c4802eaa80 (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 000000000000000f (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4801ef863 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000035 000000000000000d 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000021 00000000f5861e0c (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 0000010000000000 (XEN) 00000000c0101427 0000000000000061 0000000000000286 00000000f5861e08 (XEN) 0000000000000069 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 (XEN) Xen call trace: (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 (XEN) (XEN) Pagetable walk from ffff82c3ffeb614d: (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 15: (XEN) FATAL PAGE FAULT (XEN) [error_code=000b] (XEN) Faulting linear address: ffff82c3ffeb614d (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Feb-24 19:08 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Wasn''t pci=nomsi required for those kernels ? Wednesday, February 24, 2010, 7:47:09 PM, you wrote:> I recenty tried to boot my Debian lenny distro on a newer Intel > Nehalem box using Debian Lenny''s default xen kernel > (2.6.26-2-xen-686), and it crashed during boot. Full log attached, > but the key info is here:> I tried doing a "binary search" to see where this was introduced, but > it fails the same way all the way back to c/s 20000.> The same kernel/hypervisor combination boots fine on a different box I have.> -George> (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 15 > (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: ffff830828b0e690 > (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 > (XEN) rbp: ffff83082cd7fec8 rsp: ffff83082cd7fd88 r8: 0000000000000005 > (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: 0000000000000040 > (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d > (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff83082cd7fd88: > (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 0000000100000000 > (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 0000000000000141 > (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 ffff83082cd60000 > (XEN) 0000000000000038 00000000ffffffed 0000000000000000 ffff82c480154832 > (XEN) 00000000000004d8 0000000000000038 00000000000000e0 ffff83083fd81c80 > (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 ffff83082cd60000 > (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 0000000000000038 > (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f ffff82c48024c080 > (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff > (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a ffff82c4802eaa80 > (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 000000000000000f > (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4801ef863 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000035 000000000000000d 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000021 00000000f5861e0c > (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 0000010000000000 > (XEN) 00000000c0101427 0000000000000061 0000000000000286 00000000f5861e08 > (XEN) 0000000000000069 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 > (XEN) Xen call trace: > (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 > (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 > (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 > (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 > (XEN) > (XEN) Pagetable walk from ffff82c3ffeb614d: > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 15: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=000b] > (XEN) Faulting linear address: ffff82c3ffeb614d > (XEN) ****************************************-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Feb-24 20:20 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
On Wed, Feb 24, 2010 at 08:08:10PM +0100, Sander Eikelenboom wrote:> > Wasn''t pci=nomsi required for those kernels ? >Yeah, pci=nomsi has been the workaround for this bug.. Many people have seen (and reported) this bug in the lenny kernel. Would be good to hunt down the actual reason.. -- Pasi> > Wednesday, February 24, 2010, 7:47:09 PM, you wrote: > > > I recenty tried to boot my Debian lenny distro on a newer Intel > > Nehalem box using Debian Lenny''s default xen kernel > > (2.6.26-2-xen-686), and it crashed during boot. Full log attached, > > but the key info is here: > > > I tried doing a "binary search" to see where this was introduced, but > > it fails the same way all the way back to c/s 20000. > > > The same kernel/hypervisor combination boots fine on a different box I have. > > > -George > > > (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- > > (XEN) CPU: 15 > > (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 > > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > > (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: ffff830828b0e690 > > (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 > > (XEN) rbp: ffff83082cd7fec8 rsp: ffff83082cd7fd88 r8: 0000000000000005 > > (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: 0000000000000040 > > (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 > > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d > > (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff83082cd7fd88: > > (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 0000000100000000 > > (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 0000000000000141 > > (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 ffff83082cd60000 > > (XEN) 0000000000000038 00000000ffffffed 0000000000000000 ffff82c480154832 > > (XEN) 00000000000004d8 0000000000000038 00000000000000e0 ffff83083fd81c80 > > (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 ffff83082cd60000 > > (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 0000000000000038 > > (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f ffff82c48024c080 > > (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff > > (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a ffff82c4802eaa80 > > (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 000000000000000f > > (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4801ef863 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000035 000000000000000d 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000021 00000000f5861e0c > > (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 0000010000000000 > > (XEN) 00000000c0101427 0000000000000061 0000000000000286 00000000f5861e08 > > (XEN) 0000000000000069 0000000000000000 0000000000000000 0000000000000000 > > (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 > > (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 > > (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 > > (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 > > (XEN) > > (XEN) Pagetable walk from ffff82c3ffeb614d: > > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 > > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 > > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 > > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 15: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=000b] > > (XEN) Faulting linear address: ffff82c3ffeb614d > > (XEN) **************************************** > > > > -- > Best regards, > Sander mailto:linux@eikelenboom.it > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-24 23:57 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
pci=nomsi in the guest command line? I realize dom0 is a privileged guest, but it still seems like we should try not to crash Xen as a result on guest input. :-) Thanks for the work-around, -George On Wed, Feb 24, 2010 at 8:20 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Wed, Feb 24, 2010 at 08:08:10PM +0100, Sander Eikelenboom wrote: >> >> Wasn''t pci=nomsi required for those kernels ? >> > > Yeah, pci=nomsi has been the workaround for this bug.. > Many people have seen (and reported) this bug in the lenny kernel. > > Would be good to hunt down the actual reason.. > > -- Pasi > >> >> Wednesday, February 24, 2010, 7:47:09 PM, you wrote: >> >> > I recenty tried to boot my Debian lenny distro on a newer Intel >> > Nehalem box using Debian Lenny''s default xen kernel >> > (2.6.26-2-xen-686), and it crashed during boot. Full log attached, >> > but the key info is here: >> >> > I tried doing a "binary search" to see where this was introduced, but >> > it fails the same way all the way back to c/s 20000. >> >> > The same kernel/hypervisor combination boots fine on a different box I have. >> >> > -George >> >> > (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- >> > (XEN) CPU: 15 >> > (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >> > (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: ffff830828b0e690 >> > (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 >> > (XEN) rbp: ffff83082cd7fec8 rsp: ffff83082cd7fd88 r8: 0000000000000005 >> > (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: 0000000000000040 >> > (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 >> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 >> > (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d >> > (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 >> > (XEN) Xen stack trace from rsp=ffff83082cd7fd88: >> > (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 0000000100000000 >> > (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 0000000000000141 >> > (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 ffff83082cd60000 >> > (XEN) 0000000000000038 00000000ffffffed 0000000000000000 ffff82c480154832 >> > (XEN) 00000000000004d8 0000000000000038 00000000000000e0 ffff83083fd81c80 >> > (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 ffff83082cd60000 >> > (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 0000000000000038 >> > (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f ffff82c48024c080 >> > (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff >> > (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a ffff82c4802eaa80 >> > (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 000000000000000f >> > (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 0000000000000000 >> > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4801ef863 >> > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> > (XEN) 0000000000000035 000000000000000d 0000000000000000 0000000000000000 >> > (XEN) 0000000000000000 0000000000000000 0000000000000021 00000000f5861e0c >> > (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 0000010000000000 >> > (XEN) 00000000c0101427 0000000000000061 0000000000000286 00000000f5861e08 >> > (XEN) 0000000000000069 0000000000000000 0000000000000000 0000000000000000 >> > (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 >> > (XEN) Xen call trace: >> > (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >> > (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 >> > (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 >> > (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 >> > (XEN) >> > (XEN) Pagetable walk from ffff82c3ffeb614d: >> > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 >> > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 >> > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 >> > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff >> > (XEN) >> > (XEN) **************************************** >> > (XEN) Panic on CPU 15: >> > (XEN) FATAL PAGE FAULT >> > (XEN) [error_code=000b] >> > (XEN) Faulting linear address: ffff82c3ffeb614d >> > (XEN) **************************************** >> >> >> >> -- >> Best regards, >> Sander mailto:linux@eikelenboom.it >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Feb-25 06:50 UTC
RE: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Agree that Xen should not crash on such situation. Since it is enabling MSI, I try to check the code in __pci_enable_msi(), and didn''t find any suspicious code. As 0x ffff82c3ffeb614d is at ioremap range, I suspect it is about msix fixmap, but still didn''t find any hint. The page table walk is a bit strange to me. The L1 entry is suspicious, seems it clobered. Can you share the code around the fault IP address? --jyh (XEN) Pagetable walk from ffff82c3ffeb614d: (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 15: (XEN) FATAL PAGE FAULT (XEN) [error_code=000b] (XEN) Faulting linear address: ffff82c3ffeb614d (XEN) ****************************************>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of George Dunlap >Sent: Thursday, February 25, 2010 7:58 AM >To: Pasi Kärkkäinen >Cc: Sander Eikelenboom; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel >(2.6.26-2-xen-686) > >pci=nomsi in the guest command line? > >I realize dom0 is a privileged guest, but it still seems like we >should try not to crash Xen as a result on guest input. :-) > >Thanks for the work-around, > -George > >On Wed, Feb 24, 2010 at 8:20 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote: >> On Wed, Feb 24, 2010 at 08:08:10PM +0100, Sander Eikelenboom wrote: >>> >>> Wasn''t pci=nomsi required for those kernels ? >>> >> >> Yeah, pci=nomsi has been the workaround for this bug.. >> Many people have seen (and reported) this bug in the lenny kernel. >> >> Would be good to hunt down the actual reason.. >> >> -- Pasi >> >>> >>> Wednesday, February 24, 2010, 7:47:09 PM, you wrote: >>> >>> > I recenty tried to boot my Debian lenny distro on a newer Intel >>> > Nehalem box using Debian Lenny''s default xen kernel >>> > (2.6.26-2-xen-686), and it crashed during boot. Full log attached, >>> > but the key info is here: >>> >>> > I tried doing a "binary search" to see where this was introduced, but >>> > it fails the same way all the way back to c/s 20000. >>> >>> > The same kernel/hypervisor combination boots fine on a different box I have. >>> >>> > -George >>> >>> > (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- >>> > (XEN) CPU: 15 >>> > (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >>> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >>> > (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: >ffff830828b0e690 >>> > (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 >>> > (XEN) rbp: ffff83082cd7fec8 rsp: >ffff83082cd7fd88 r8: 0000000000000005 >>> > (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: >0000000000000040 >>> > (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 >>> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: >00000000000026f0 >>> > (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d >>> > (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 >>> > (XEN) Xen stack trace from rsp=ffff83082cd7fd88: >>> > (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 >0000000100000000 >>> > (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 >0000000000000141 >>> > (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 >ffff83082cd60000 >>> > (XEN) 0000000000000038 00000000ffffffed 0000000000000000 >ffff82c480154832 >>> > (XEN) 00000000000004d8 0000000000000038 00000000000000e0 >ffff83083fd81c80 >>> > (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 >ffff83082cd60000 >>> > (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 >0000000000000038 >>> > (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f >ffff82c48024c080 >>> > (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff >>> > (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a >ffff82c4802eaa80 >>> > (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 >000000000000000f >>> > (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 >0000000000000000 >>> > (XEN) 0000000000000000 0000000000000000 0000000000000000 >ffff82c4801ef863 >>> > (XEN) 0000000000000000 0000000000000000 0000000000000000 >0000000000000000 >>> > (XEN) 0000000000000035 000000000000000d 0000000000000000 >0000000000000000 >>> > (XEN) 0000000000000000 0000000000000000 0000000000000021 >00000000f5861e0c >>> > (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 >0000010000000000 >>> > (XEN) 00000000c0101427 0000000000000061 0000000000000286 >00000000f5861e08 >>> > (XEN) 0000000000000069 0000000000000000 0000000000000000 >0000000000000000 >>> > (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 >>> > (XEN) Xen call trace: >>> > (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >>> > (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 >>> > (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 >>> > (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 >>> > (XEN) >>> > (XEN) Pagetable walk from ffff82c3ffeb614d: >>> > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 >>> > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 >>> > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 >>> > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff >>> > (XEN) >>> > (XEN) **************************************** >>> > (XEN) Panic on CPU 15: >>> > (XEN) FATAL PAGE FAULT >>> > (XEN) [error_code=000b] >>> > (XEN) Faulting linear address: ffff82c3ffeb614d >>> > (XEN) **************************************** >>> >>> >>> >>> -- >>> Best regards, >>> Sander mailto:linux@eikelenboom.it >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 09:16 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 00:57 >>> >I realize dom0 is a privileged guest, but it still seems like we >should try not to crash Xen as a result on guest input. :-)While generally I agree, I think in the given case this is unavoidable - Xen could apply some sanity check, but the passing of a machine address from Dom0 to Xen implies that Dom0 knows what it does, and Xen trusts it. Specifically, struct physdev_map_pirq has this contents according to the trace .domid = 00007ff0 .type = 00000000 .index = ffffffff .pirq = ffffffff .bus = 00000000 .devfn = 00000008 .entry_nr = 00000000 .table_base = f5861e4a00000001 table_base would seem like not having been initialized at all. I would guess that they use the structure definition from before c/s 18323 (which had, instead of a table_base member, an int field indicating MSI vs. MSI-X. The original definition was added with c/s 17534 and 17535, but all of those changes happened during 3.3 development, so no-one should be using the old definition in released code.. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Feb-25 09:28 UTC
RE: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Seems the table_base is not initialized, otherwise, it should be 0x1, instead of 0x f5861e4a00000001. I checked the libxc, and seems the parameter need be cleared in libxc. I didn''t check kernel code now. I suspect followed patch is needed (the patch is only compiled and not tested). --jyh diff -r 89dfe955f1c3 tools/libxc/xc_physdev.c --- a/tools/libxc/xc_physdev.c Thu Feb 25 17:17:02 2010 +0800 +++ b/tools/libxc/xc_physdev.c Thu Feb 25 17:27:10 2010 +0800 @@ -31,6 +31,7 @@ int xc_physdev_map_pirq(int xc_handle, if ( !pirq ) return -EINVAL; + memset(&map, 0, sizeof(struct physdev_map_pirq)); map.domid = domid; map.type = MAP_PIRQ_TYPE_GSI; map.index = index; @@ -59,6 +60,7 @@ int xc_physdev_map_pirq_msi(int xc_handl if ( !pirq ) return -EINVAL; + memset(&map, 0, sizeof(struct physdev_map_pirq)); map.domid = domid; map.type = MAP_PIRQ_TYPE_MSI; map.index = index; @@ -83,6 +85,7 @@ int xc_physdev_unmap_pirq(int xc_handle, int rc; struct physdev_unmap_pirq unmap; + memset(&unmap, 0, sizeof(struct physdev_unmap_pirq)); unmap.domid = domid; unmap.pirq = pirq;>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich >Sent: Thursday, February 25, 2010 5:16 PM >To: George Dunlap; pasik@iki.fi >Cc: Sander Eikelenboom; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel >(2.6.26-2-xen-686) > >>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 00:57 >>> >>I realize dom0 is a privileged guest, but it still seems like we >>should try not to crash Xen as a result on guest input. :-) > >While generally I agree, I think in the given case this is unavoidable - >Xen could apply some sanity check, but the passing of a machine >address from Dom0 to Xen implies that Dom0 knows what it does, >and Xen trusts it. Specifically, struct physdev_map_pirq has this >contents according to the trace > >.domid = 00007ff0 >.type = 00000000 >.index = ffffffff >.pirq = ffffffff >.bus = 00000000 >.devfn = 00000008 >.entry_nr = 00000000 >.table_base = f5861e4a00000001 > >table_base would seem like not having been initialized at all. I >would guess that they use the structure definition from before >c/s 18323 (which had, instead of a table_base member, an >int field indicating MSI vs. MSI-X. The original definition was >added with c/s 17534 and 17535, but all of those changes >happened during 3.3 development, so no-one should be using >the old definition in released code.. > >Jan > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-25 10:48 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
$ addr2line -e xen/xen-syms 0xffff82c4801535a9 /xensource/hg/open-source/xen-unstable.hg/xen/arch/x86/msi.c:588 Which in my code is: /* Mask interrupt here */ writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); And tracing back, ->mask_base is computed in part with table_base (which as Jan pointed out, seems borked). Hmm... Jan said according to the stack: .table_base = f5861e4a00000001 Is it possible that there''s actually a bug in the compat code, and that table_base actually *was* set to (uint32_t)1? If a reasonable number for table_base is "1", giving it 64 bits in the structure would seem a bit like overkill... -George On Thu, Feb 25, 2010 at 6:50 AM, Jiang, Yunhong <yunhong.jiang@intel.com> wrote:> Agree that Xen should not crash on such situation. > > Since it is enabling MSI, I try to check the code in __pci_enable_msi(), and didn''t find any suspicious code. As 0x ffff82c3ffeb614d is at ioremap range, I suspect it is about msix fixmap, but still didn''t find any hint. > The page table walk is a bit strange to me. The L1 entry is suspicious, seems it clobered. > Can you share the code around the fault IP address? > > --jyh > > > (XEN) Pagetable walk from ffff82c3ffeb614d: > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 15: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=000b] > (XEN) Faulting linear address: ffff82c3ffeb614d > (XEN) **************************************** > > > >>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of George Dunlap >>Sent: Thursday, February 25, 2010 7:58 AM >>To: Pasi Kärkkäinen >>Cc: Sander Eikelenboom; xen-devel@lists.xensource.com >>Subject: Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel >>(2.6.26-2-xen-686) >> >>pci=nomsi in the guest command line? >> >>I realize dom0 is a privileged guest, but it still seems like we >>should try not to crash Xen as a result on guest input. :-) >> >>Thanks for the work-around, >> -George >> >>On Wed, Feb 24, 2010 at 8:20 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote: >>> On Wed, Feb 24, 2010 at 08:08:10PM +0100, Sander Eikelenboom wrote: >>>> >>>> Wasn''t pci=nomsi required for those kernels ? >>>> >>> >>> Yeah, pci=nomsi has been the workaround for this bug.. >>> Many people have seen (and reported) this bug in the lenny kernel. >>> >>> Would be good to hunt down the actual reason.. >>> >>> -- Pasi >>> >>>> >>>> Wednesday, February 24, 2010, 7:47:09 PM, you wrote: >>>> >>>> > I recenty tried to boot my Debian lenny distro on a newer Intel >>>> > Nehalem box using Debian Lenny''s default xen kernel >>>> > (2.6.26-2-xen-686), and it crashed during boot. Full log attached, >>>> > but the key info is here: >>>> >>>> > I tried doing a "binary search" to see where this was introduced, but >>>> > it fails the same way all the way back to c/s 20000. >>>> >>>> > The same kernel/hypervisor combination boots fine on a different box I have. >>>> >>>> > -George >>>> >>>> > (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=n Tainted: C ]---- >>>> > (XEN) CPU: 15 >>>> > (XEN) RIP: e008:[<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >>>> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor >>>> > (XEN) rax: ffff82c3ffeb6141 rbx: 0000000000000000 rcx: >>ffff830828b0e690 >>>> > (XEN) rdx: ffff830828b0e718 rsi: ffff8300bf6975b0 rdi: ffff830828b0e6e0 >>>> > (XEN) rbp: ffff83082cd7fec8 rsp: >>ffff83082cd7fd88 r8: 0000000000000005 >>>> > (XEN) r9: ffff830828b0e700 r10: 00000000000000a8 r11: >>0000000000000040 >>>> > (XEN) r12: ffff830828b0e710 r13: ffff830828b0e670 r14: ffff83082cd7fe38 >>>> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: >>00000000000026f0 >>>> > (XEN) cr3: 000000043fdcf000 cr2: ffff82c3ffeb614d >>>> > (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 >>>> > (XEN) Xen stack trace from rsp=ffff83082cd7fd88: >>>> > (XEN) 0000000000000246 ffff830828b0e6e0 0000000080000000 >>0000000100000000 >>>> > (XEN) 3408000000000000 0000000200000000 000f5861e4a00100 >>0000000000000141 >>>> > (XEN) ffff83082cd71ec8 ffff83082cd7fec8 0000000000000136 >>ffff83082cd60000 >>>> > (XEN) 0000000000000038 00000000ffffffed 0000000000000000 >>ffff82c480154832 >>>> > (XEN) 00000000000004d8 0000000000000038 00000000000000e0 >>ffff83083fd81c80 >>>> > (XEN) ffff830828b0e670 00000000f5861e0c 0000000000000000 >>ffff83082cd60000 >>>> > (XEN) ffff83082cd400f0 00000000f5861e0c 0000000000000136 >>0000000000000038 >>>> > (XEN) ffff83082cd7fec8 ffff82c4801ede5f 7265646e776f206f >>ffff82c48024c080 >>>> > (XEN) ffff83082cd60180 ffff83082cd7fe98 0000000000007ff0 ffffffffffffffff >>>> > (XEN) 0000000800000000 0000000100000000 ffff8308f5861e4a >>ffff82c4802eaa80 >>>> > (XEN) 0000000800000000 0000000000000038 f5861e4a00000001 >>000000000000000f >>>> > (XEN) ffffffffffffffff ffff8300bf2e8000 0000000000000035 >>0000000000000000 >>>> > (XEN) 0000000000000000 0000000000000000 0000000000000000 >>ffff82c4801ef863 >>>> > (XEN) 0000000000000000 0000000000000000 0000000000000000 >>0000000000000000 >>>> > (XEN) 0000000000000035 000000000000000d 0000000000000000 >>0000000000000000 >>>> > (XEN) 0000000000000000 0000000000000000 0000000000000021 >>00000000f5861e0c >>>> > (XEN) 00000000c033741f 00000000ffffffff 0000000000000000 >>0000010000000000 >>>> > (XEN) 00000000c0101427 0000000000000061 0000000000000286 >>00000000f5861e08 >>>> > (XEN) 0000000000000069 0000000000000000 0000000000000000 >>0000000000000000 >>>> > (XEN) 0000000000000000 000000000000000f ffff8300bf2e8000 >>>> > (XEN) Xen call trace: >>>> > (XEN) [<ffff82c4801535a9>] pci_enable_msi+0x4c9/0x580 >>>> > (XEN) [<ffff82c480154832>] map_domain_pirq+0x242/0x2f0 >>>> > (XEN) [<ffff82c4801ede5f>] compat_physdev_op+0xc8f/0x1010 >>>> > (XEN) [<ffff82c4801ef863>] compat_hypercall+0x83/0x90 >>>> > (XEN) >>>> > (XEN) Pagetable walk from ffff82c3ffeb614d: >>>> > (XEN) L4[0x105] = 00000000bf4e2027 5555555555555555 >>>> > (XEN) L3[0x10f] = 00000000bf698063 5555555555555555 >>>> > (XEN) L2[0x1ff] = 00000000bf697063 5555555555555555 >>>> > (XEN) L1[0x0b6] = f5861e4a00100173 ffffffffffffffff >>>> > (XEN) >>>> > (XEN) **************************************** >>>> > (XEN) Panic on CPU 15: >>>> > (XEN) FATAL PAGE FAULT >>>> > (XEN) [error_code=000b] >>>> > (XEN) Faulting linear address: ffff82c3ffeb614d >>>> > (XEN) **************************************** >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Sander mailto:linux@eikelenboom.it >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 10:56 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 11:48 >>> >Is it possible that there''s actually a bug in the compat code, and >that table_base actually *was* set to (uint32_t)1? If a reasonable >number for table_base is "1", giving it 64 bits in the structure would >seem a bit like overkill..."1" definitely is not a reasonable value here. And it''s also not the compat code I''m sure - it is the kernel using a bad structure definition. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-25 11:46 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
I''m looking at the debian source package for this kernel to see if I can sort out where it got the header from. Given that this is already in a major distribution, is there any way we can fail gracefully if someone''s running this kernel? I''m not familiar enough with the MSI to know if this is possible, or what a good set of "sanity checks" would be for failing the hypercall. -George On Thu, Feb 25, 2010 at 10:56 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 11:48 >>> >>Is it possible that there''s actually a bug in the compat code, and >>that table_base actually *was* set to (uint32_t)1? If a reasonable >>number for table_base is "1", giving it 64 bits in the structure would >>seem a bit like overkill... > > "1" definitely is not a reasonable value here. And it''s also not the > compat code I''m sure - it is the kernel using a bad structure definition. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-25 12:13 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
(Jeremy: Discussing the default Lenny dom0 package, 2.6.26-2-xen--686 crashing during boot if MSIs are available.) Sure enough, the structure it''s using looks like this: struct physdev_map_pirq { domid_t domid; /* IN */ int type; /* IN */ int index; /* IN or OUT */ int pirq; /* IN */ struct { int bus, devfn, entry_nr; int msi; /* 0 - MSIX 1 - MSI */ } msi_info; }; The code in question came from a patch called suse-20080808143035.patch; reading the numbers as the timestamp "2008 August 8" would seem to match up with the 3.3 dev lifecycle. Any suggestions for a simple fix I can try to push upstream? -George On Thu, Feb 25, 2010 at 11:46 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> I''m looking at the debian source package for this kernel to see if I > can sort out where it got the header from. > > Given that this is already in a major distribution, is there any way > we can fail gracefully if someone''s running this kernel? I''m not > familiar enough with the MSI to know if this is possible, or what a > good set of "sanity checks" would be for failing the hypercall. > > -George > > On Thu, Feb 25, 2010 at 10:56 AM, Jan Beulich <JBeulich@novell.com> wrote: >>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 11:48 >>> >>>Is it possible that there''s actually a bug in the compat code, and >>>that table_base actually *was* set to (uint32_t)1? If a reasonable >>>number for table_base is "1", giving it 64 bits in the structure would >>>seem a bit like overkill... >> >> "1" definitely is not a reasonable value here. And it''s also not the >> compat code I''m sure - it is the kernel using a bad structure definition. >> >> Jan >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 13:07 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >Any suggestions for a simple fix I can try to push upstream?I''m afraid not (other than simply disabling at least the MSI-X part of the code), as it would require table_base to be initialized properly. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 13:10 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 12:46 >>> >Given that this is already in a major distribution, is there any way >we can fail gracefully if someone''s running this kernel? I''m not >familiar enough with the MSI to know if this is possible, or what a >good set of "sanity checks" would be for failing the hypercall.Checking for a suitably aligned and within PADDR_BITS range would certainly be doable, but I question the sense. Such a check would prevent the crash, but it wouldn''t guarantee the system would work (and may result in more subtle crashes). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Feb-25 13:19 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Does the 2.6.18.8-xen linux tree require pci=nomsi ? Or was is something introduced after 2.6.18 ? The debian/novell forward ported kernels are all based on the 2.6.18.8 patches, and some extra patches by the distributions. So shouldn''t it be some specific patch applied to only the debian tree ? Thursday, February 25, 2010, 2:07:29 PM, you wrote:>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>Any suggestions for a simple fix I can try to push upstream?> I''m afraid not (other than simply disabling at least the MSI-X part of > the code), as it would require table_base to be initialized properly.> Jan-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Feb-25 13:24 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
On Thu, Feb 25, 2010 at 02:19:30PM +0100, Sander Eikelenboom wrote:> Does the 2.6.18.8-xen linux tree require pci=nomsi ? > Or was is something introduced after 2.6.18 ? >This bug doesn''t exist in 2.6.18-xen or in the current novell forwardports, afaik. It only exists in the snapshot that debian uses in their 2.6.26-*-xen kernels.> The debian/novell forward ported kernels are all based on the 2.6.18.8 patches, and some extra patches by the distributions. > So shouldn''t it be some specific patch applied to only the debian tree ? >Yes, the bugfix patch is only needed in the debian 2.6.26 tree. -- Pasi> Thursday, February 25, 2010, 2:07:29 PM, you wrote: > > >>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> > >>Any suggestions for a simple fix I can try to push upstream? > > > I''m afraid not (other than simply disabling at least the MSI-X part of > > the code), as it would require table_base to be initialized properly. > > > Jan > > > > > -- > Best regards, > Sander mailto:linux@eikelenboom.it > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-25 13:28 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Forward porting the linux-2.6.18 patch has been straightforward so far... I''m going to give it a spin and see if it actually boots. :-) It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. -George On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote:>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>Any suggestions for a simple fix I can try to push upstream? > > I''m afraid not (other than simply disabling at least the MSI-X part of > the code), as it would require table_base to be initialized properly. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-25 13:43 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
>>> Sander Eikelenboom <linux@eikelenboom.it> 25.02.10 14:19 >>> >So shouldn''t it be some specific patch applied to only the debian tree ?Based on the patch file name George reported, I think much rather a patch that later wasn''t updated as needed... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Feb-25 14:05 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
On Thu, Feb 25, 2010 at 01:43:53PM +0000, Jan Beulich wrote:> >>> Sander Eikelenboom <linux@eikelenboom.it> 25.02.10 14:19 >>> > >So shouldn''t it be some specific patch applied to only the debian tree ? > > Based on the patch file name George reported, I think much rather a > patch that later wasn''t updated as needed... >Yeah, I don''t think the Debian 2.6.26 kernel Xen patch has been updated after the initial drop in 2008.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Feb-26 01:42 UTC
RE: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Hmm, this issue is caused because of changeset 18323, which extend the physdev_map_pirq strucutre. IIRC, this is mainly for SR-IOV support, that Xen can''t get the MMIO BAR from the virtual device. However, dig into futher, I suspect if we need to change the definition of ''struct physdev_op''. Currently there is no maxium length limit, should it have something like the "pad" in struct xen_platform_op? One possibility is, if one new type physdev operation added, which extend the length of struct physdev_op, it may cause issue (like copy_from_guest in do_physdev_op failed randomly). But I have no idea of how to add the padd without breaking compatibility. --jyh @@ -136,10 +136,13 @@ struct physdev_map_pirq { /* IN or OUT */ int pirq; /* IN */ - struct { - int bus, devfn, entry_nr; - int msi; /* 0 - MSIX 1 - MSI */ - } msi_info; + int bus; + /* IN */ + int devfn; + /* IN */ + int entry_nr; + /* IN */ + uint64_t table_base;>-----Original Message----- >From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George >Dunlap >Sent: Thursday, February 25, 2010 8:14 PM >To: Jan Beulich >Cc: Jiang, Yunhong; Sander Eikelenboom; xen-devel@lists.xensource.com; Jeremy >Fitzhardinge >Subject: Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel >(2.6.26-2-xen-686) > >(Jeremy: Discussing the default Lenny dom0 package, 2.6.26-2-xen--686 >crashing during boot if MSIs are available.) > >Sure enough, the structure it''s using looks like this: > >struct physdev_map_pirq { > domid_t domid; > /* IN */ > int type; > /* IN */ > int index; > /* IN or OUT */ > int pirq; > /* IN */ > struct { > int bus, devfn, entry_nr; > int msi; /* 0 - MSIX 1 - MSI */ > } msi_info; >}; > >The code in question came from a patch called >suse-20080808143035.patch; reading the numbers as the timestamp "2008 >August 8" would seem to match up with the 3.3 dev lifecycle. > >Any suggestions for a simple fix I can try to push upstream? > > -George > >On Thu, Feb 25, 2010 at 11:46 AM, George Dunlap ><George.Dunlap@eu.citrix.com> wrote: >> I''m looking at the debian source package for this kernel to see if I >> can sort out where it got the header from. >> >> Given that this is already in a major distribution, is there any way >> we can fail gracefully if someone''s running this kernel? I''m not >> familiar enough with the MSI to know if this is possible, or what a >> good set of "sanity checks" would be for failing the hypercall. >> >> -George >> >> On Thu, Feb 25, 2010 at 10:56 AM, Jan Beulich <JBeulich@novell.com> wrote: >>>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 11:48 >>> >>>>Is it possible that there''s actually a bug in the compat code, and >>>>that table_base actually *was* set to (uint32_t)1? If a reasonable >>>>number for table_base is "1", giving it 64 bits in the structure would >>>>seem a bit like overkill... >>> >>> "1" definitely is not a reasonable value here. And it''s also not the >>> compat code I''m sure - it is the kernel using a bad structure definition. >>> >>> Jan >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-26 10:55 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Jiang, Yunhong wrote:> Hmm, this issue is caused because of changeset 18323, which extend the physdev_map_pirq strucutre. IIRC, this is mainly for SR-IOV support, that Xen can''t get the MMIO BAR from the virtual device. > > However, dig into futher, I suspect if we need to change the definition of ''struct physdev_op''. Currently there is no maxium length limit, should it have something like the "pad" in struct xen_platform_op? >The padding isn''t the problem; the problem is that Xen is expecting an address in there, but it''s getting "garbage + {0,1}". As Jan pointed out, how is Xen supposed to distinguish an address from garbage + incorrect parameter? At any rate, I have a patch to the debian kernel I''ll post in a bit. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-26 10:56 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Jiang, Yunhong wrote:> Hmm, this issue is caused because of changeset 18323, which extend the physdev_map_pirq strucutre. IIRC, this is mainly for SR-IOV support, that Xen can''t get the MMIO BAR from the virtual device. > > However, dig into futher, I suspect if we need to change the definition of ''struct physdev_op''. Currently there is no maxium length limit, should it have something like the "pad" in struct xen_platform_op? >The padding isn''t the problem; the problem is that Xen is expecting an address in there, but it''s getting "garbage + {0,1}". As Jan pointed out, how is Xen supposed to distinguish an address from garbage + incorrect parameter? At any rate, I have a patch to the debian kernel I''ll post in a bit. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-26 11:05 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
OK, the attached patch fits cleanly in to the debian source package infrastructure and can result in a built .deb file that actually works. Can those who know the system take a quick look to see if there''s anything obviously broken? I''ll file a bug report to debian with the patch Monday; hopefully it will be picked up quickly. Thanks, -George On Thu, Feb 25, 2010 at 1:28 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> Forward porting the linux-2.6.18 patch has been straightforward so > far... I''m going to give it a spin and see if it actually boots. :-) > > It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. > > -George > > On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote: >>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>>Any suggestions for a simple fix I can try to push upstream? >> >> I''m afraid not (other than simply disabling at least the MSI-X part of >> the code), as it would require table_base to be initialized properly. >> >> Jan >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Feb-26 11:14 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
For what I remember from the past, the debian packages automatically put pci=nomsi in grub etc. So perhaps it''s wise to get that workaround removed as well. -- Sander Friday, February 26, 2010, 12:05:48 PM, you wrote:> OK, the attached patch fits cleanly in to the debian source package > infrastructure and can result in a built .deb file that actually > works.> Can those who know the system take a quick look to see if there''s > anything obviously broken? I''ll file a bug report to debian with the > patch Monday; hopefully it will be picked up quickly.> Thanks, > -George> On Thu, Feb 25, 2010 at 1:28 PM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: >> Forward porting the linux-2.6.18 patch has been straightforward so >> far... I''m going to give it a spin and see if it actually boots. :-) >> >> It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. >> >> -George >> >> On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote: >>>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>>>Any suggestions for a simple fix I can try to push upstream? >>> >>> I''m afraid not (other than simply disabling at least the MSI-X part of >>> the code), as it would require table_base to be initialized properly. >>> >>> Jan >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2010-Feb-26 11:21 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Do they? I''ve installed Lenny in 2 boxes recently (within the last 2 months), and neither has the pci=nomsi in the grub config. I''m pretty sure I would have remembered removing it. It''s also not in the "xenkopt=" automagic kernel configuration. I did install from a local mirror, but I''ve updated since then... -George Sander Eikelenboom wrote:> For what I remember from the past, the debian packages automatically put pci=nomsi in grub etc. > So perhaps it''s wise to get that workaround removed as well. > > -- > Sander > > > Friday, February 26, 2010, 12:05:48 PM, you wrote: > > >> OK, the attached patch fits cleanly in to the debian source package >> infrastructure and can result in a built .deb file that actually >> works. >> > > >> Can those who know the system take a quick look to see if there''s >> anything obviously broken? I''ll file a bug report to debian with the >> patch Monday; hopefully it will be picked up quickly. >> > > >> Thanks, >> -George >> > > >> On Thu, Feb 25, 2010 at 1:28 PM, George Dunlap >> <George.Dunlap@eu.citrix.com> wrote: >> >>> Forward porting the linux-2.6.18 patch has been straightforward so >>> far... I''m going to give it a spin and see if it actually boots. :-) >>> >>> It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. >>> >>> -George >>> >>> On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote: >>> >>>>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>>>>>> >>>>> Any suggestions for a simple fix I can try to push upstream? >>>>> >>>> I''m afraid not (other than simply disabling at least the MSI-X part of >>>> the code), as it would require table_base to be initialized properly. >>>> >>>> Jan >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>>> >>>> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Feb-26 12:04 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
On Fri, Feb 26, 2010 at 11:21:14AM +0000, George Dunlap wrote:> Do they? I''ve installed Lenny in 2 boxes recently (within the last 2 > months), and neither has the pci=nomsi in the grub config. I''m pretty > sure I would have remembered removing it. It''s also not in the > "xenkopt=" automagic kernel configuration. > > I did install from a local mirror, but I''ve updated since then... >Yeah, pci=nomsi is not there as a default in Lenny. -- Pasi> -George > > Sander Eikelenboom wrote: >> For what I remember from the past, the debian packages automatically put pci=nomsi in grub etc. >> So perhaps it''s wise to get that workaround removed as well. >> >> -- >> Sander >> >> >> Friday, February 26, 2010, 12:05:48 PM, you wrote: >> >> >>> OK, the attached patch fits cleanly in to the debian source package >>> infrastructure and can result in a built .deb file that actually >>> works. >>> >> >> >>> Can those who know the system take a quick look to see if there''s >>> anything obviously broken? I''ll file a bug report to debian with the >>> patch Monday; hopefully it will be picked up quickly. >>> >> >> >>> Thanks, >>> -George >>> >> >> >>> On Thu, Feb 25, 2010 at 1:28 PM, George Dunlap >>> <George.Dunlap@eu.citrix.com> wrote: >>> >>>> Forward porting the linux-2.6.18 patch has been straightforward so >>>> far... I''m going to give it a spin and see if it actually boots. :-) >>>> >>>> It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. >>>> >>>> -George >>>> >>>> On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote: >>>> >>>>>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>>>>>>> >>>>>> Any suggestions for a simple fix I can try to push upstream? >>>>>> >>>>> I''m afraid not (other than simply disabling at least the MSI-X part of >>>>> the code), as it would require table_base to be initialized properly. >>>>> >>>>> Jan >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>>>> >>>>> >> >> >> >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Feb-26 12:26 UTC
Re: [Xen-devel] Crash during boot in Debian lenny default dom0 kernel (2.6.26-2-xen-686)
Hmm seems I was wrong, perhaps i have added it myself to the kopt, just can''t remember it Friday, February 26, 2010, 1:04:21 PM, you wrote:> On Fri, Feb 26, 2010 at 11:21:14AM +0000, George Dunlap wrote: >> Do they? I''ve installed Lenny in 2 boxes recently (within the last 2 >> months), and neither has the pci=nomsi in the grub config. I''m pretty >> sure I would have remembered removing it. It''s also not in the >> "xenkopt=" automagic kernel configuration. >> >> I did install from a local mirror, but I''ve updated since then... >>> Yeah, pci=nomsi is not there as a default in Lenny.> -- Pasi>> -George >> >> Sander Eikelenboom wrote: >>> For what I remember from the past, the debian packages automatically put pci=nomsi in grub etc. >>> So perhaps it''s wise to get that workaround removed as well. >>> >>> -- >>> Sander >>> >>> >>> Friday, February 26, 2010, 12:05:48 PM, you wrote: >>> >>> >>>> OK, the attached patch fits cleanly in to the debian source package >>>> infrastructure and can result in a built .deb file that actually >>>> works. >>>> >>> >>> >>>> Can those who know the system take a quick look to see if there''s >>>> anything obviously broken? I''ll file a bug report to debian with the >>>> patch Monday; hopefully it will be picked up quickly. >>>> >>> >>> >>>> Thanks, >>>> -George >>>> >>> >>> >>>> On Thu, Feb 25, 2010 at 1:28 PM, George Dunlap >>>> <George.Dunlap@eu.citrix.com> wrote: >>>> >>>>> Forward porting the linux-2.6.18 patch has been straightforward so >>>>> far... I''m going to give it a spin and see if it actually boots. :-) >>>>> >>>>> It''s linux-2.6.18.hg c/s 645:359b1e70d9eb, in case you''re interested. >>>>> >>>>> -George >>>>> >>>>> On Thu, Feb 25, 2010 at 1:07 PM, Jan Beulich <JBeulich@novell.com> wrote: >>>>> >>>>>>>>> George Dunlap <George.Dunlap@eu.citrix.com> 25.02.10 13:13 >>> >>>>>>>>> >>>>>>> Any suggestions for a simple fix I can try to push upstream? >>>>>>> >>>>>> I''m afraid not (other than simply disabling at least the MSI-X part of >>>>>> the code), as it would require table_base to be initialized properly. >>>>>> >>>>>> Jan >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>>>> >>>>>> >>> >>> >>> >>> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel