Hi, Information: CentOS 5.5 x86_64 dom0: latest xen/stable-2.6.32.x pvops git commit 75cc13f5aa29b4f3227d269ca165dfa8937c94fe xen version: xen-4.0.2-rc1-pre from xen-4.0-testing.hg changeset 21422 While doing LVM snapshot for migration and get the following: Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev Dec 26 15:58:29 xen01 kernel: CPU 0 Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8 EFLAGS: 00010282 Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX: 0000000000017605 RCX: 00000000000000bb Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08: 0000000000000028 R09: ffff880000000000 Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11: 00007fdb5665e600 R12: 0000000000000003 Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14: ffff880012ee0780 R15: 00007fdb56224268 Dec 26 15:58:30 xen01 kernel: FS: 00007fdb56fed710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 Dec 26 15:58:30 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3: 000000003addb000 CR4: 0000000000002660 Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo ffff88003bc3a000, task ffff880012ee0780) Dec 26 15:58:30 xen01 kernel: Stack: Dec 26 15:58:30 xen01 kernel: 0000000000000000 0000000000424121 000000013f00ae20 0000000000017605 Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c ffff88003a3c2580 ffff880034bb6588 Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af ffff88003bc3be58 ffffffff810a402f Dec 26 15:58:31 xen01 kernel: Call Trace: Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Dec 26 15:58:31 xen01 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce Dec 26 15:58:31 xen01 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 Dec 26 15:58:31 xen01 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 Dec 26 15:58:31 xen01 kernel: [<ffffffff810c4cd1>] ? __fput+0x1cb/0x1da Dec 26 15:58:31 xen01 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 Dec 26 15:58:31 xen01 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Dec 26 15:58:31 xen01 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 26 15:58:31 xen01 kernel: RSP <ffff88003bc3bdc8> Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]--- Triggered BUG() in line 1860: static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) { struct mmuext_op op; op.cmd = cmd; op.arg1.mfn = pfn_to_mfn(pfn); if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) BUG(); <<THIS ONE? } Any idea? Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-27 15:53 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sun, Dec 26, 2010 at 04:16:16PM +0800, Teck Choon Giam wrote:> Hi, > > Information: CentOS 5.5 x86_64 > dom0: latest xen/stable-2.6.32.x pvops git commit > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe > xen version: xen-4.0.2-rc1-pre from xen-4.0-testing.hg changeset 21422 > > While doing LVM snapshot for migration and get the following: > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > Dec 26 15:58:29 xen01 kernel: CPU 0 > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE iptable_nat > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi > iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi > scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs > sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac > parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core > iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash > dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted > 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP > Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8 EFLAGS: 00010282 > Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX: 0000000000017605 > RCX: 00000000000000bb > Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef > RDI: 00000000deadbeef > Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08: 0000000000000028 > R09: ffff880000000000 > Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11: 00007fdb5665e600 > R12: 0000000000000003 > Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14: ffff880012ee0780 > R15: 00007fdb56224268 > Dec 26 15:58:30 xen01 kernel: FS: 00007fdb56fed710(0000) > GS:ffff88002804f000(0000) knlGS:0000000000000000 > Dec 26 15:58:30 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: > 000000008005003b > Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3: 000000003addb000 > CR4: 0000000000002660 > Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: 0000000000000000 > Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > DR7: 0000000000000400 > Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo > ffff88003bc3a000, task ffff880012ee0780) > Dec 26 15:58:30 xen01 kernel: Stack: > Dec 26 15:58:30 xen01 kernel: 0000000000000000 0000000000424121 > 000000013f00ae20 0000000000017605 > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c > ffff88003a3c2580 ffff880034bb6588 > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af > ffff88003bc3be58 ffffffff810a402f > Dec 26 15:58:31 xen01 kernel: Call Trace: > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e07c>] > xen_alloc_ptpage+0x64/0x69 > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a41cd>] > handle_mm_fault+0x140/0x8b9 > Dec 26 15:58:31 xen01 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 > Dec 26 15:58:31 xen01 kernel: [<ffffffff810c4cd1>] ? __fput+0x1cb/0x1da > Dec 26 15:58:31 xen01 kernel: [<ffffffff8131be4d>] > do_page_fault+0x252/0x2e2 > Dec 26 15:58:31 xen01 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 > Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b > 48 > Dec 26 15:58:31 xen01 kernel: RIP [<ffffffff8100cb5b>] > pin_pagetable_pfn+0x53/0x59 > Dec 26 15:58:31 xen01 kernel: RSP <ffff88003bc3bdc8> > Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]--- > > Triggered BUG() in line 1860: > > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) > { > struct mmuext_op op; > op.cmd = cmd; > op.arg1.mfn = pfn_to_mfn(pfn); > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) > BUG(); <<THIS ONE?Yup.> } > > Any idea?Do you get to see this every time you do LVM migrate?> > Thanks. > > Kindest regards, > Giam Teck Choon> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2010-Dec-27 22:14 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Mon, Dec 27, 2010 at 11:53 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Sun, Dec 26, 2010 at 04:16:16PM +0800, Teck Choon Giam wrote: > > Hi, > > > > Information: CentOS 5.5 x86_64 > > dom0: latest xen/stable-2.6.32.x pvops git commit > > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe > > xen version: xen-4.0.2-rc1-pre from xen-4.0-testing.hg changeset 21422 > > > > While doing LVM snapshot for migration and get the following: > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > iptable_nat > > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > > xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp > be2iscsi > > iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi > > scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs > > sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac > > parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core > > iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror > dm_region_hash > > dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd > > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > > Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted > > 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP > > Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] > > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > > Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8 EFLAGS: > 00010282 > > Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX: 0000000000017605 > > RCX: 00000000000000bb > > Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef > > RDI: 00000000deadbeef > > Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08: 0000000000000028 > > R09: ffff880000000000 > > Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11: 00007fdb5665e600 > > R12: 0000000000000003 > > Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14: ffff880012ee0780 > > R15: 00007fdb56224268 > > Dec 26 15:58:30 xen01 kernel: FS: 00007fdb56fed710(0000) > > GS:ffff88002804f000(0000) knlGS:0000000000000000 > > Dec 26 15:58:30 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: > > 000000008005003b > > Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3: 000000003addb000 > > CR4: 0000000000002660 > > Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > > DR2: 0000000000000000 > > Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > > DR7: 0000000000000400 > > Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo > > ffff88003bc3a000, task ffff880012ee0780) > > Dec 26 15:58:30 xen01 kernel: Stack: > > Dec 26 15:58:30 xen01 kernel: 0000000000000000 0000000000424121 > > 000000013f00ae20 0000000000017605 > > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c > > ffff88003a3c2580 ffff880034bb6588 > > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af > > ffff88003bc3be58 ffffffff810a402f > > Dec 26 15:58:31 xen01 kernel: Call Trace: > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e07c>] > > xen_alloc_ptpage+0x64/0x69 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e0af>] > xen_alloc_pte+0xe/0x10 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a41cd>] > > handle_mm_fault+0x140/0x8b9 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810c4cd1>] ? __fput+0x1cb/0x1da > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8131be4d>] > > do_page_fault+0x252/0x2e2 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 > > Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 > c2 > > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 > ff > > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 > 5b > > 48 > > Dec 26 15:58:31 xen01 kernel: RIP [<ffffffff8100cb5b>] > > pin_pagetable_pfn+0x53/0x59 > > Dec 26 15:58:31 xen01 kernel: RSP <ffff88003bc3bdc8> > > Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]--- > > > > Triggered BUG() in line 1860: > > > > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) > > { > > struct mmuext_op op; > > op.cmd = cmd; > > op.arg1.mfn = pfn_to_mfn(pfn); > > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) > > BUG(); <<THIS ONE? > > Yup. > > } > > > > Any idea? > > Do you get to see this every time you do LVM migrate? >My method of migration is to do below without using xen xm migrate... ... 1. LVM snapshot of domU 2. mount LVM snapshot domU 3. rsync over to the target host 4. umount LVM snapshot domU 5. remove LVM snapshot domU 6. shutdown domU 7. mount LVM domU 8. rsync mounted LVM domU over to target host 9. start domU in the new target host Actually server will crash if I do daily LVM snapshot to backup domUs not just for migration. And this happen almost daily :( Method of backup domU: 1. LVM snapshot domU 2. mount LVM snapshot domU 3. rsync to disk as backup 4. umount LVM snapshot domU 5. remove LVM snapshot domU Even I use combined ionice and nice... ... yet still crash... ... Maybe it is time to roll back to XenLinux 2.6.18.8... ... Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Dec-28 10:42 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Dec 28, 2010 at 06:14:16AM +0800, Teck Choon Giam wrote:> On Mon, Dec 27, 2010 at 11:53 PM, Konrad Rzeszutek Wilk > <[1]konrad.wilk@oracle.com> wrote: > > On Sun, Dec 26, 2010 at 04:16:16PM +0800, Teck Choon Giam wrote: > > Hi, > > > > Information: CentOS 5.5 x86_64 > > dom0: latest xen/stable-2.6.32.x pvops git commit > > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe > > xen version: xen-4.0.2-rc1-pre from xen-4.0-testing.hg changeset 21422 > > > > While doing LVM snapshot for migration and get the following: > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > iptable_nat > > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack > ipt_REJECT > > xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp > be2iscsi > > iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi > > scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output > sbs > > sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac > > parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core > > iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror > dm_region_hash > > dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd > > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > > Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted > > 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP > > Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] > > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > > Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8 EFLAGS: > 00010282 > > Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX: > 0000000000017605 > > RCX: 00000000000000bb > > Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI: > 00000000deadbeef > > RDI: 00000000deadbeef > > Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08: > 0000000000000028 > > R09: ffff880000000000 > > Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11: > 00007fdb5665e600 > > R12: 0000000000000003 > > Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14: > ffff880012ee0780 > > R15: 00007fdb56224268 > > Dec 26 15:58:30 xen01 kernel: FS: 00007fdb56fed710(0000) > > GS:ffff88002804f000(0000) knlGS:0000000000000000 > > Dec 26 15:58:30 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: > > 000000008005003b > > Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3: > 000000003addb000 > > CR4: 0000000000002660 > > Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 > > DR2: 0000000000000000 > > Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 > > DR7: 0000000000000400 > > Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo > > ffff88003bc3a000, task ffff880012ee0780) > > Dec 26 15:58:30 xen01 kernel: Stack: > > Dec 26 15:58:30 xen01 kernel: 0000000000000000 0000000000424121 > > 000000013f00ae20 0000000000017605 > > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c > > ffff88003a3c2580 ffff880034bb6588 > > Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af > > ffff88003bc3be58 ffffffff810a402f > > Dec 26 15:58:31 xen01 kernel: Call Trace: > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e07c>] > > xen_alloc_ptpage+0x64/0x69 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e0af>] > xen_alloc_pte+0xe/0x10 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a402f>] > __pte_alloc+0x70/0xce > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810a41cd>] > > handle_mm_fault+0x140/0x8b9 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff810c4cd1>] ? > __fput+0x1cb/0x1da > > Dec 26 15:58:31 xen01 kernel: [<ffffffff8131be4d>] > > do_page_fault+0x252/0x2e2 > > Dec 26 15:58:31 xen01 kernel: [<ffffffff81319dd5>] > page_fault+0x25/0x30 > > Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 > 21 c2 > > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 > c7 ff > > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb > 74 5b > > 48 > > Dec 26 15:58:31 xen01 kernel: RIP [<ffffffff8100cb5b>] > > pin_pagetable_pfn+0x53/0x59 > > Dec 26 15:58:31 xen01 kernel: RSP <ffff88003bc3bdc8> > > Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]--- > > > > Triggered BUG() in line 1860: > > > > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) > > { > > struct mmuext_op op; > > op.cmd = cmd; > > op.arg1.mfn = pfn_to_mfn(pfn); > > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) > > BUG(); <<THIS ONE? > > Yup. > > } > > > > Any idea? > > Do you get to see this every time you do LVM migrate? > > My method of migration is to do below without using xen xm migrate... ... > > 1. LVM snapshot of domU > 2. mount LVM snapshot domU > 3. rsync over to the target host > 4. umount LVM snapshot domU > 5. remove LVM snapshot domU > 6. shutdown domU > 7. mount LVM domU > 8. rsync mounted LVM domU over to target host > 9. start domU in the new target host > > Actually server will crash if I do daily LVM snapshot to backup domUs not > just for migration. And this happen almost daily :( > > Method of backup domU: > > 1. LVM snapshot domU > 2. mount LVM snapshot domU > 3. rsync to disk as backup > 4. umount LVM snapshot domU > 5. remove LVM snapshot domU > > Even I use combined ionice and nice... ... yet still crash... ... > > Maybe it is time to roll back to XenLinux 2.6.18.8... ... >It would be very good to track this down and get it fixed.. hopefully you''re able to help a bit and try some things to debug it. Konrad maybe has some ideas to try.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2010-Dec-28 18:01 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> It would be very good to track this down and get it fixed.. > hopefully you''re able to help a bit and try some things to debug it. > > Konrad maybe has some ideas to try.. > >I would love to track this down and have it fixed or else this is a stopper for majority of us to deploy pvops stable kernel 2.6.32.x in production. Well at least for me... ... More related logs and it seems all is related to LVM since all will have the last sysfs file to /dev/block/dm-??/dev... ? /var/log/messages.2:Dec 16 01:00:04 xen01 kernel: ------------[ cut here ]------------ /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: invalid opcode: 0000 [#1] SMP /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: last sysfs file: /sys/block/dm-17/dev /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: CPU 5 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Modules linked in: dm_snapshot ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_mirror dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core iTCO_wdt e1000e shpchp pcspkr dm_region_hash dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Pid: 26100, comm: udevd Not tainted 2.6.32.26-3.xen.pvops.choon.centos5 #1 S3420GP /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RSP: e02b:ffff8800086abdc8 EFLAGS: 00010282 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RAX: 00000000ffffffea RBX: 000000000003cf00 RCX: 00000000000001e7 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RBP: ffff8800086abde8 R08: 0000000000000800 R09: ffff880000000000 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: R10: 00000000deadbeef R11: 00007fe3d2a72600 R12: 0000000000000003 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: R13: 000000000003cf00 R14: ffff88003384c140 R15: 00007fe3d2638268 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: FS: 00007fe3d3401710(0000) GS:ffff8800280e0000(0000) knlGS:0000000000000000 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: CR2: 00007fe3d2638268 CR3: 0000000014ff7000 CR4: 0000000000002660 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Process udevd (pid: 26100, threadinfo ffff8800086aa000, task ffff88003384c140) /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Stack: /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: 0000000000000000 000000000024f026 000000013e766520 000000000003cf00 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: <0> ffff8800086abe08 ffffffff8100e07c ffff88003ad77ac0 ffff88003ce03498 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: <0> ffff8800086abe18 ffffffff8100e0af ffff8800086abe58 ffffffff810a4013 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Call Trace: /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff810a4013>] __pte_alloc+0x70/0xce /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff810a41b1>] handle_mm_fault+0x140/0x8b9 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff810d2ea0>] ? d_kill+0x3a/0x42 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff8131ba4d>] do_page_fault+0x252/0x2e2 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: [<ffffffff813199d5>] page_fault+0x25/0x30 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: RSP <ffff8800086abdc8> /var/log/messages.2-Dec 16 01:00:04 xen01 kernel: ---[ end trace 6828227b20a6a7c6 ]--- /var/log/messages.1:Dec 25 21:45:20 xen01 kernel: ------------[ cut here ]------------ /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: invalid opcode: 0000 [#1] SMP /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: last sysfs file: /sys/block/dm-26/dev /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: CPU 7 /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core iTCO_wdt shpchp e1000e pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: Pid: 20985, comm: udevd Not tainted 2.6.32.26-3.xen.pvops.choon.centos5 #1 S3420GP /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: RSP: e02b:ffff880035c91dc8 EFLAGS: 00010282 /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: RAX: 00000000ffffffea RBX: 0000000000036540 RCX: 00000000000001b2 /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: RBP: ffff880035c91de8 R08: 0000000000000a00 R09: ffff880000000000 /var/log/messages.1-Dec 25 21:45:20 xen01 kernel: R10: 00000000deadbeef R11: 00007f8988e49600 R12: 0000000000000003 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: R13: 0000000000036540 R14: ffff880036768540 R15: 00007f8988a0f268 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: FS: 00007f89897d8710(0000) GS:ffff88002811a000(0000) knlGS:0000000000000000 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: CR2: 00007f8988a0f268 CR3: 00000000375b8000 CR4: 0000000000002660 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: Process udevd (pid: 20985, threadinfo ffff880035c90000, task ffff880036768540) /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: Stack: /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: 0000000000000000 00000000004431e6 000000013dba6e20 0000000000036540 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: <0> ffff880035c91e08 ffffffff8100e07c ffff8800375abac0 ffff88000665e228 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: <0> ffff880035c91e18 ffffffff8100e0af ffff880035c91e58 ffffffff810a4013 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: Call Trace: /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff810a4013>] __pte_alloc+0x70/0xce /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff810a41b1>] handle_mm_fault+0x140/0x8b9 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff8131ba4d>] do_page_fault+0x252/0x2e2 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: [<ffffffff813199d5>] page_fault+0x25/0x30 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: RSP <ffff880035c91dc8> /var/log/messages.1-Dec 25 21:45:21 xen01 kernel: ---[ end trace f394f31a52cfbac7 ]--- /var/log/messages:Dec 28 01:20:44 xen01 kernel: ------------[ cut here ]------------ /var/log/messages-Dec 28 01:20:44 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! /var/log/messages-Dec 28 01:20:44 xen01 kernel: invalid opcode: 0000 [#1] SMP /var/log/messages-Dec 28 01:20:44 xen01 kernel: last sysfs file: /sys/block/dm-26/dev /var/log/messages-Dec 28 01:20:44 xen01 kernel: CPU 2 /var/log/messages-Dec 28 01:20:44 xen01 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages-Dec 28 01:20:44 xen01 kernel: Pid: 30295, comm: sh Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP /var/log/messages-Dec 28 01:20:44 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Dec 28 01:20:44 xen01 kernel: RSP: e02b:ffff880036661dc8 EFLAGS: 00010282 /var/log/messages-Dec 28 01:20:44 xen01 kernel: RAX: 00000000ffffffea RBX: 000000000003be66 RCX: 00000000000001df /var/log/messages-Dec 28 01:20:44 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef /var/log/messages-Dec 28 01:20:44 xen01 kernel: RBP: ffff880036661de8 R08: 0000000000000330 R09: ffff880000000000 /var/log/messages-Dec 28 01:20:44 xen01 kernel: R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 /var/log/messages-Dec 28 01:20:44 xen01 kernel: R13: 000000000003be66 R14: ffff88001845e180 R15: 00000037e629a4d5 /var/log/messages-Dec 28 01:20:44 xen01 kernel: FS: 00007f240f1b96e0(0000) GS:ffff880028089000(0000) knlGS:0000000000000000 /var/log/messages-Dec 28 01:20:44 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages-Dec 28 01:20:44 xen01 kernel: CR2: 00000037e629a4d5 CR3: 0000000031765000 CR4: 0000000000002660 /var/log/messages-Dec 28 01:20:44 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages-Dec 28 01:20:44 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages-Dec 28 01:20:44 xen01 kernel: Process sh (pid: 30295, threadinfo ffff880036660000, task ffff88001845e180) /var/log/messages-Dec 28 01:20:44 xen01 kernel: Stack: /var/log/messages-Dec 28 01:20:44 xen01 kernel: 0000000000000000 00000000004508c0 000000013dae4820 000000000003be66 /var/log/messages-Dec 28 01:20:44 xen01 kernel: <0> ffff880036661e08 ffffffff8100e07c ffff880035da1580 ffff880035c48988 /var/log/messages-Dec 28 01:20:44 xen01 kernel: <0> ffff880036661e18 ffffffff8100e0af ffff880036661e58 ffffffff810a402f /var/log/messages-Dec 28 01:20:44 xen01 kernel: Call Trace: /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff81319dd5>] ? page_fault+0x25/0x30 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff8116dd7d>] ? __put_user_4+0x1d/0x30 /var/log/messages-Dec 28 01:20:44 xen01 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 /var/log/messages-Dec 28 01:20:44 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 /var/log/messages-Dec 28 01:20:45 xen01 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Dec 28 01:20:45 xen01 kernel: RSP <ffff880036661dc8> /var/log/messages-Dec 28 01:20:45 xen01 kernel: ---[ end trace b65ec9b025b586cf ]--- /var/log/messages:Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ /var/log/messages-Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! /var/log/messages-Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP /var/log/messages-Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev /var/log/messages-Dec 26 15:58:29 xen01 kernel: CPU 0 /var/log/messages-Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tpm_tis tpm tpm_bios button i2c_i801 i2c_core iTCO_wdt e1000e shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages-Dec 26 15:58:29 xen01 kernel: Pid: 27998, comm: udevd Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 S3420GP /var/log/messages-Dec 26 15:58:29 xen01 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Dec 26 15:58:29 xen01 kernel: RSP: e02b:ffff88003bc3bdc8 EFLAGS: 00010282 /var/log/messages-Dec 26 15:58:29 xen01 kernel: RAX: 00000000ffffffea RBX: 0000000000017605 RCX: 00000000000000bb /var/log/messages-Dec 26 15:58:29 xen01 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef /var/log/messages-Dec 26 15:58:29 xen01 kernel: RBP: ffff88003bc3bde8 R08: 0000000000000028 R09: ffff880000000000 /var/log/messages-Dec 26 15:58:29 xen01 kernel: R10: 00000000deadbeef R11: 00007fdb5665e600 R12: 0000000000000003 /var/log/messages-Dec 26 15:58:30 xen01 kernel: R13: 0000000000017605 R14: ffff880012ee0780 R15: 00007fdb56224268 /var/log/messages-Dec 26 15:58:30 xen01 kernel: FS: 00007fdb56fed710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 /var/log/messages-Dec 26 15:58:30 xen01 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages-Dec 26 15:58:30 xen01 kernel: CR2: 00007fdb56224268 CR3: 000000003addb000 CR4: 0000000000002660 /var/log/messages-Dec 26 15:58:30 xen01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages-Dec 26 15:58:30 xen01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages-Dec 26 15:58:30 xen01 kernel: Process udevd (pid: 27998, threadinfo ffff88003bc3a000, task ffff880012ee0780) /var/log/messages-Dec 26 15:58:30 xen01 kernel: Stack: /var/log/messages-Dec 26 15:58:30 xen01 kernel: 0000000000000000 0000000000424121 000000013f00ae20 0000000000017605 /var/log/messages-Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be08 ffffffff8100e07c ffff88003a3c2580 ffff880034bb6588 /var/log/messages-Dec 26 15:58:30 xen01 kernel: <0> ffff88003bc3be18 ffffffff8100e0af ffff88003bc3be58 ffffffff810a402f /var/log/messages-Dec 26 15:58:31 xen01 kernel: Call Trace: /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff810c4cd1>] ? __fput+0x1cb/0x1da /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 /var/log/messages-Dec 26 15:58:31 xen01 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 /var/log/messages-Dec 26 15:58:31 xen01 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 /var/log/messages-Dec 26 15:58:31 xen01 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Dec 26 15:58:31 xen01 kernel: RSP <ffff88003bc3bdc8> /var/log/messages-Dec 26 15:58:31 xen01 kernel: ---[ end trace 540bcf6f0170242d ]--- Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2010-Dec-29 04:25 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Dec 29, 2010 at 2:01 AM, Teck Choon Giam <giamteckchoon@gmail.com>wrote:> > It would be very good to track this down and get it fixed.. >> hopefully you''re able to help a bit and try some things to debug it. >> >> Konrad maybe has some ideas to try.. >> >> > I would love to track this down and have it fixed or else this is a stopper > for majority of us to deploy pvops stable kernel 2.6.32.x in production. > Well at least for me... ... > > More related logs and it seems all is related to LVM since all will have > the last sysfs file to /dev/block/dm-??/dev... ? >While waiting for others to offer any assistance... I created a script to basically test lvm create snapshot, mount, umount and remove snapshot... ... the result is quite sad as I am able to reproduce the crash with loop count 100 easily on two of my servers with different specs... ... If anyone is willing to run the script to test whether can reproduce the crash on their deveopment xen servers or any LVM host servers would be great. The below script assume that those snapshot LVM can be mounted as personally I am running all domUs. If your LV Group Name is different, just ammend the LVGroupName=VolGroup or anything to suit your environment. ----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- #!/bin/sh # # This script is to create lvm snapshot, mount it, umount it and remove in a # specified number of loops to test whether it will crash the host server. # # Created by Giam Teck Choon # # The LV name and for this case we are using XenGroup LVGroupName=XenGroup # return 1 if is mounted otherwise return 0 check_mount() { local checkdir=${1} if [ -n "$checkdir" ] ; then local check=`grep "$checkdir" /proc/mounts` if [ -n "$check" ] ; then return 1 fi fi return 0 } do_lvm_create_remove() { # number of loops default is 1 local loopcountlimit=${1:-1} # snapshot size default is 1G local snapshotsize=${2:-1G} # implement a sleep between create, mount, umount and remove (default is 0 which is no pause) local pauseinterval=${3:-0} # We filter out control, snapshot and swap local count=0 if [ -d "/dev/${LVGroupName}" ] ; then while [ "$count" -lt "$loopcountlimit" ] do count=`expr $count + 1` echo "${count} ... ... " for i in `ls /dev/${LVGroupName} | grep -Ev ''snapshot$'' | grep -Ev ''swap$''`; do if [ -h "/dev/${LVGroupName}/${i}" ] ; then echo -n "lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} ... ... " lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} echo "done." sleep ${pauseinterval} mkdir -p /mnt/testlvm/${i} if [ -h "/dev/${LVGroupName}/${i}-snapshot" ] ; then check_mount /mnt/testlvm/${i} local ismount=$? if [ "$ismount" -eq 0 ] ; then echo -n "mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} ... ... " mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} fi check_mount /mnt/testlvm/${i} local ismount2=$? if [ "$ismount2" -eq 1 ] ; then echo -n "umount /mnt/testlvm/${i} ... ... " umount /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} fi fi rm -rf /mnt/testlvm/${i} echo -n "lvremove -f /dev/${LVGroupName}/${i}-snapshot ... ... " lvremove -f /dev/${LVGroupName}/${i}-snapshot echo "done." sleep ${pauseinterval} fi done rm -fr /mnt/testlvm done else echo "/dev/${LVGroupName} directory not found!" exit 1 fi } case $1 in loop) shift do_lvm_create_remove "$@" ;; *) echo "Usage: $0 loop loopcountlimit(default is 1) snapshotsize(default is 1GB) pauseinterval (default is 0)" ;; esac ----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- I name the above script as test.sh then run: Below is without pause # sh test.sh loop 100 Then I will get something printed to output as below: Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: ------------[ cut here ]------------ Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: invalid opcode: 0000 [#1] SMP Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: last sysfs file: /sys/block/dm-16/dev Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: Stack: Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: Call Trace: Message from syslogd@ at Wed Dec 29 12:13:19 2010 ... xen03 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Unable to read from event server. Then I can run the below command to get the log after it is back online: # grep -A 50 ''cut here'' /var/log/messages Dec 29 11:47:32 xen03 kernel: ------------[ cut here ]------------ Dec 29 11:47:32 xen03 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! Dec 29 11:47:32 xen03 kernel: invalid opcode: 0000 [#2] SMP Dec 29 11:47:32 xen03 kernel: last sysfs file: /sys/block/dm-16/dev Dec 29 11:47:32 xen03 kernel: CPU 3 Dec 29 11:47:32 xen03 kernel: Modules linked in: ext4 jbd2 crc16 xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tg3 libphy ide_cd_mod cdrom button serio_raw tpm_tis tpm tpm_bios pcspkr iTCO_wdt i2c_i801 shpchp i2c_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Dec 29 11:47:32 xen03 kernel: Pid: 30576, comm: mpath_wait Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 Dec 29 11:47:32 xen03 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 11:47:32 xen03 kernel: RSP: e02b:ffff880030a7fdc8 EFLAGS: 00010282 Dec 29 11:47:32 xen03 kernel: RAX: 00000000ffffffea RBX: 000000000002445a RCX: 0000000000000122 Dec 29 11:47:32 xen03 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Dec 29 11:47:32 xen03 kernel: RBP: ffff880030a7fde8 R08: 00000000000002d0 R09: ffff880000000000 Dec 29 11:47:32 xen03 kernel: R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 Dec 29 11:47:32 xen03 kernel: R13: 000000000002445a R14: ffff88002456e480 R15: 0000003db6c9a4d5 Dec 29 11:47:32 xen03 kernel: FS: 00007f45a01586e0(0000) GS:ffff8800280a6000(0000) knlGS:0000000000000000 Dec 29 11:47:32 xen03 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 29 11:47:32 xen03 kernel: CR2: 0000003db6c9a4d5 CR3: 000000001cc2d000 CR4: 0000000000002660 Dec 29 11:47:32 xen03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 29 11:47:32 xen03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 29 11:47:32 xen03 kernel: Process mpath_wait (pid: 30576, threadinfo ffff880030a7e000, task ffff88002456e480) Dec 29 11:47:32 xen03 kernel: Stack: Dec 29 11:47:32 xen03 kernel: 0000000000000000 00000000001f667f 000000013e6f3e18 000000000002445a Dec 29 11:47:32 xen03 kernel: <0> ffff880030a7fe08 ffffffff8100e07c ffff88003d2fc040 ffff8800246c9db0 Dec 29 11:47:32 xen03 kernel: <0> ffff880030a7fe18 ffffffff8100e0af ffff880030a7fe58 ffffffff810a402f Dec 29 11:47:32 xen03 kernel: Call Trace: Dec 29 11:47:32 xen03 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Dec 29 11:47:32 xen03 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Dec 29 11:47:32 xen03 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce Dec 29 11:47:32 xen03 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 Dec 29 11:47:32 xen03 kernel: [<ffffffff81319dd5>] ? page_fault+0x25/0x30 Dec 29 11:47:32 xen03 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 Dec 29 11:47:32 xen03 kernel: [<ffffffff8116dd7d>] ? __put_user_4+0x1d/0x30 Dec 29 11:47:32 xen03 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 Dec 29 11:47:32 xen03 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Dec 29 11:47:32 xen03 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 11:47:32 xen03 kernel: RSP <ffff880030a7fdc8> Dec 29 11:47:32 xen03 kernel: ---[ end trace 900e639a50e97057 ]--- Dec 29 11:50:38 xen03 syslogd 1.4.1: restart. Dec 29 11:50:38 xen03 kernel: klogd 1.4.1, log source = /proc/kmsg started. Dec 29 11:50:38 xen03 kernel: Linux version 2.6.32.27-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Sat Dec 25 09:19:44 SGT 2010 Dec 29 11:50:38 xen03 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 Dec 29 11:50:38 xen03 kernel: KERNEL supported cpus: Dec 29 11:50:38 xen03 kernel: Intel GenuineIntel Dec 29 11:50:38 xen03 kernel: AMD AuthenticAMD Dec 29 11:50:38 xen03 kernel: Centaur CentaurHauls Dec 29 11:50:38 xen03 kernel: released 0 pages of unused memory Dec 29 11:50:38 xen03 kernel: BIOS-provided physical RAM map: Dec 29 11:50:38 xen03 kernel: Xen: 0000000000000000 - 00000000000a0000 (usable) Dec 29 11:50:38 xen03 kernel: Xen: 00000000000a0000 - 0000000000100000 (reserved) Dec 29 11:50:38 xen03 kernel: Xen: 0000000000100000 - 0000000040000000 (usable) Dec 29 11:50:38 xen03 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 (ACPI data) -- Dec 29 11:52:30 xen03 kernel: ------------[ cut here ]------------ Dec 29 11:52:30 xen03 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! Dec 29 11:52:30 xen03 kernel: invalid opcode: 0000 [#1] SMP Dec 29 11:52:30 xen03 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Dec 29 11:52:30 xen03 kernel: CPU 3 Dec 29 11:52:30 xen03 kernel: Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tg3 libphy ide_cd_mod cdrom serio_raw button tpm_tis tpm tpm_bios i2c_i801 i2c_core iTCO_wdt pcspkr shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Dec 29 11:52:30 xen03 kernel: Pid: 8000, comm: udevd Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 Dec 29 11:52:30 xen03 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 11:52:30 xen03 kernel: RSP: e02b:ffff88001f0c9dc8 EFLAGS: 00010282 Dec 29 11:52:30 xen03 kernel: RAX: 00000000ffffffea RBX: 0000000000037fc2 RCX: 00000000000001bf Dec 29 11:52:30 xen03 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Dec 29 11:52:30 xen03 kernel: RBP: ffff88001f0c9de8 R08: 0000000000000e10 R09: ffff880000000000 Dec 29 11:52:30 xen03 kernel: R10: 00000000deadbeef R11: 00007fb810051ce0 R12: 0000000000000003 Dec 29 11:52:30 xen03 kernel: R13: 0000000000037fc2 R14: ffff88003257e100 R15: 00007fb80fc14158 Dec 29 11:52:30 xen03 kernel: FS: 00007fb81024e710(0000) GS:ffff8800280a6000(0000) knlGS:0000000000000000 Dec 29 11:52:30 xen03 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 29 11:52:30 xen03 kernel: CR2: 00007fb80fc14158 CR3: 0000000032ee7000 CR4: 0000000000002660 Dec 29 11:52:30 xen03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 29 11:52:30 xen03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 29 11:52:30 xen03 kernel: Process udevd (pid: 8000, threadinfo ffff88001f0c8000, task ffff88003257e100) Dec 29 11:52:30 xen03 kernel: Stack: Dec 29 11:52:30 xen03 kernel: 0000000000000000 000000000020a314 000000013e6f3e18 0000000000037fc2 Dec 29 11:52:30 xen03 kernel: <0> ffff88001f0c9e08 ffffffff8100e07c ffff880032c2a040 ffff880032aaa3f0 Dec 29 11:52:30 xen03 kernel: <0> ffff88001f0c9e18 ffffffff8100e0af ffff88001f0c9e58 ffffffff810a402f Dec 29 11:52:30 xen03 kernel: Call Trace: Dec 29 11:52:30 xen03 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Dec 29 11:52:30 xen03 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Dec 29 11:52:30 xen03 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce Dec 29 11:52:30 xen03 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 Dec 29 11:52:30 xen03 kernel: [<ffffffff810d2ecc>] ? d_kill+0x3a/0x42 Dec 29 11:52:30 xen03 kernel: [<ffffffff810c4cd1>] ? __fput+0x1cb/0x1da Dec 29 11:52:30 xen03 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 Dec 29 11:52:30 xen03 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 Dec 29 11:52:30 xen03 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Dec 29 11:52:30 xen03 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 11:52:30 xen03 kernel: RSP <ffff88001f0c9dc8> Dec 29 11:52:30 xen03 kernel: ---[ end trace 2b8ce24b81556aa1 ]--- Dec 29 11:52:40 xen03 kernel: dmeventd[4328]: segfault at 90 ip 0000003db740bcd0 sp 00000000408080c8 error 4 in libpthread-2.5.so [3db7400000+16000] Dec 29 11:55:35 xen03 syslogd 1.4.1: restart. Dec 29 11:55:35 xen03 kernel: klogd 1.4.1, log source = /proc/kmsg started. Dec 29 11:55:35 xen03 kernel: Linux version 2.6.32.27-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Sat Dec 25 09:19:44 SGT 2010 Dec 29 11:55:35 xen03 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 Dec 29 11:55:35 xen03 kernel: KERNEL supported cpus: Dec 29 11:55:35 xen03 kernel: Intel GenuineIntel Dec 29 11:55:35 xen03 kernel: AMD AuthenticAMD Dec 29 11:55:35 xen03 kernel: Centaur CentaurHauls Dec 29 11:55:35 xen03 kernel: released 0 pages of unused memory Dec 29 11:55:35 xen03 kernel: BIOS-provided physical RAM map: Dec 29 11:55:35 xen03 kernel: Xen: 0000000000000000 - 00000000000a0000 (usable) Dec 29 11:55:35 xen03 kernel: Xen: 00000000000a0000 - 0000000000100000 (reserved) Dec 29 11:55:35 xen03 kernel: Xen: 0000000000100000 - 0000000040000000 (usable) -- Dec 29 12:13:19 xen03 kernel: ------------[ cut here ]------------ Dec 29 12:13:19 xen03 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! Dec 29 12:13:19 xen03 kernel: invalid opcode: 0000 [#1] SMP Dec 29 12:13:19 xen03 kernel: last sysfs file: /sys/block/dm-16/dev Dec 29 12:13:19 xen03 kernel: CPU 1 Dec 29 12:13:19 xen03 kernel: Modules linked in: ext4 jbd2 crc16 xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi loop dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 button libphy tpm_tis tpm tpm_bios iTCO_wdt i2c_i801 pcspkr i2c_core shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Dec 29 12:13:19 xen03 kernel: Pid: 15285, comm: udevd Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 Dec 29 12:13:19 xen03 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 12:13:19 xen03 kernel: RSP: e02b:ffff880025db5dc8 EFLAGS: 00010282 Dec 29 12:13:19 xen03 kernel: RAX: 00000000ffffffea RBX: 0000000000035093 RCX: 00000000000001a8 Dec 29 12:13:19 xen03 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Dec 29 12:13:19 xen03 kernel: RBP: ffff880025db5de8 R08: 0000000000000498 R09: ffff880000000000 Dec 29 12:13:19 xen03 kernel: R10: 00000000deadbeef R11: 00007f7824d70600 R12: 0000000000000003 Dec 29 12:13:19 xen03 kernel: R13: 0000000000035093 R14: ffff880039ff0540 R15: 00007f7824936268 Dec 29 12:13:19 xen03 kernel: FS: 00007f78256fe710(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 Dec 29 12:13:19 xen03 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 29 12:13:19 xen03 kernel: CR2: 00007f7824936268 CR3: 0000000039fef000 CR4: 0000000000002660 Dec 29 12:13:19 xen03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 29 12:13:19 xen03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 29 12:13:19 xen03 kernel: Process udevd (pid: 15285, threadinfo ffff880025db4000, task ffff880039ff0540) Dec 29 12:13:19 xen03 kernel: Stack: Dec 29 12:13:19 xen03 kernel: 0000000000000000 0000000000207243 000000013e64d518 0000000000035093 Dec 29 12:13:19 xen03 kernel: <0> ffff880025db5e08 ffffffff8100e07c ffff88003a50fac0 ffff8800352a6920 Dec 29 12:13:19 xen03 kernel: <0> ffff880025db5e18 ffffffff8100e0af ffff880025db5e58 ffffffff810a402f Dec 29 12:13:19 xen03 kernel: Call Trace: Dec 29 12:13:19 xen03 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Dec 29 12:13:19 xen03 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Dec 29 12:13:19 xen03 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce Dec 29 12:13:19 xen03 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 Dec 29 12:13:19 xen03 kernel: [<ffffffff813199ff>] ? _spin_unlock_irqrestore+0x11/0x13 Dec 29 12:13:19 xen03 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 Dec 29 12:13:19 xen03 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 Dec 29 12:13:19 xen03 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Dec 29 12:13:19 xen03 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Dec 29 12:13:19 xen03 kernel: RSP <ffff880025db5dc8> Dec 29 12:13:19 xen03 kernel: ---[ end trace aa0f33d0cdc0d845 ]--- Dec 29 12:16:25 xen03 syslogd 1.4.1: restart. Dec 29 12:16:25 xen03 kernel: klogd 1.4.1, log source = /proc/kmsg started. Dec 29 12:16:25 xen03 kernel: Linux version 2.6.32.27-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Sat Dec 25 09:19:44 SGT 2010 Dec 29 12:16:25 xen03 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 Dec 29 12:16:25 xen03 kernel: KERNEL supported cpus: Dec 29 12:16:25 xen03 kernel: Intel GenuineIntel Dec 29 12:16:25 xen03 kernel: AMD AuthenticAMD Dec 29 12:16:25 xen03 kernel: Centaur CentaurHauls Dec 29 12:16:25 xen03 kernel: released 0 pages of unused memory Dec 29 12:16:25 xen03 kernel: BIOS-provided physical RAM map: Dec 29 12:16:25 xen03 kernel: Xen: 0000000000000000 - 00000000000a0000 (usable) Dec 29 12:16:25 xen03 kernel: Xen: 00000000000a0000 - 0000000000100000 (reserved) Dec 29 12:16:25 xen03 kernel: Xen: 0000000000100000 - 0000000040000000 (usable) Dec 29 12:16:25 xen03 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 (ACPI data) Dec 29 12:16:25 xen03 kernel: Xen: 00000000dffcfc00 - 00000000dffff000 (reserved) My next crash test for this will be with a pause... to see whether I can reproduce the crash or not... then after that I will implement a sync before each crash to test crash... ... Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2010-Dec-29 04:58 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Below is my latest test crash script: ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- #!/bin/sh # # This script is to create lvm snapshot, mount it, umount it and remove in a # specified number of loops to test whether it will crash the host server. # All LVM snapshots assumed can be mounted like if you are running a PV domU. # # Created by Giam Teck Choon # # The LV name and for this case we are using XenGroup LVGroupName=XenGroup # return 1 if is mounted otherwise return 0 check_mount() { local checkdir=${1} if [ -n "$checkdir" ] ; then local check=`grep "$checkdir" /proc/mounts` if [ -n "$check" ] ; then return 1 fi fi return 0 } do_lvm_create_remove() { # number of loops default is 1 local loopcountlimit=${1:-1} # snapshot size default is 1G local snapshotsize=${2:-1G} # implement a sleep between create, mount, umount and remove (default is 0 which is no pause) local pauseinterval=${3:-0} # execute commands after each pause/sleep such as sync or anything that you want to test local commands=${4} # We filter out snapshot and swap local count=0 if [ -d "/dev/${LVGroupName}" ] ; then while [ "$count" -lt "$loopcountlimit" ] do count=`expr $count + 1` echo "${count} ... ... " for i in `ls /dev/${LVGroupName} | grep -Ev ''snapshot$'' | grep -Ev ''swap$''`; do if [ -h "/dev/${LVGroupName}/${i}" ] ; then echo -n "lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} ... ... " lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi mkdir -p /mnt/testlvm/${i} if [ -h "/dev/${LVGroupName}/${i}-snapshot" ] ; then check_mount /mnt/testlvm/${i} local ismount=$? if [ "$ismount" -eq 0 ] ; then echo -n "mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} ... ... " mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi check_mount /mnt/testlvm/${i} local ismount2=$? if [ "$ismount2" -eq 1 ] ; then echo -n "umount /mnt/testlvm/${i} ... ... " umount /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi fi rm -rf /mnt/testlvm/${i} echo -n "lvremove -f /dev/${LVGroupName}/${i}-snapshot ... ... " lvremove -f /dev/${LVGroupName}/${i}-snapshot echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi done rm -fr /mnt/testlvm done else echo "/dev/${LVGroupName} directory not found!" exit 1 fi } case $1 in loop) shift do_lvm_create_remove "$@" ;; *) cat <<HELP Usage: $0 loop loopcountlimit snapshotsize pauseinterval commands Where: loopcountlimit is default to 1 snapshotsize is default to 1G pauseinterval is default to 0 commands is default to none Example to run with 100 loops without pause/sleep: $0 loop 100 Example to run with 100 loops with pause/sleep of 5 seconds: $0 loop 100 1G 5 Example to run with 100 loops with snapshot size of 2G instead of 1G: $0 loop 100 2G Example to run with 50 loops, 1G snapshot size, 5 seconds pause and with sync: command with each pause/sleep $0 loop 50 1G 5 sync Example to run with 50 loops, 1G snapshot size, no pause and with sync: command with each pause/sleep $0 loop 50 1G 0 sync Example to run your own commands: $0 loop 100 1G 5 "echo hi && sync" HELP ;; esac ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, 2010-12-26 at 08:16 +0000, Teck Choon Giam wrote:> > Triggered BUG() in line 1860: > > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) > { > struct mmuext_op op; > op.cmd = cmd; > op.arg1.mfn = pfn_to_mfn(pfn); > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) > BUG(); <<THIS ONE? > }A failure to pin/unpin is usually associated with a log message from the hypervisor. Please can you attempt to capture the full host log, e.g. using serial console. See http://wiki.xen.org/xenwiki/XenParavirtOps under "Are there more debugging options I could enable to troubleshoot booting problems?" for some details. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2011-Jan-04 15:10 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hello thread, hijacking this thread since I am running into the same issue on a new machine.> > > While doing LVM snapshot for migration and get the following: > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > It would be very good to track this down and get it fixed.. > hopefully you''re able to help a bit and try some things to debug it. > > Konrad maybe has some ideas to try..I am seeing this with an lvcreate here, so I guess it''s somehow related to device-mapper stuff in general. It doesn''t look like this has been resolved yet. Somewhere I saw a request for the hypervisor message related to the pinning failure. Here it is: (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) (XEN) mm.c:2733:d0 Error while pinning mfn 41114f I have a bit of experience in debugging things, so if I can help someone with more information... Cheers, Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2011-Jan-04 15:19 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hi again,> > > > While doing LVM snapshot for migration and get the following: > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > > > It would be very good to track this down and get it fixed.. > > hopefully you''re able to help a bit and try some things to debug it. > > > > Konrad maybe has some ideas to try.. > > I am seeing this with an lvcreate here, so I guess it''s somehow related > to device-mapper stuff in general. > > It doesn''t look like this has been resolved yet. Somewhere I saw a > request for the hypervisor message related to the pinning failure. > > Here it is: > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > I have a bit of experience in debugging things, so if I can help someone > with more information...Additional information: This happened with a number of commands now. However, I am running a multipath setup and every time the crash seemed to be caused in the process context of the multipath daemon. I think the daemon listens to events from the device-mapper subsystem to watch for changes and the problem somehow arises from there, since on another machine with the same XEN/Dom0 version without such a daemon I never had any troubles with LVM. [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 [<ffffffff8144ff65>] page_fault+0x25/0x30 Cheers, Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, the same issue on old IBM x335 servers with QLogic ISP2312 HBAs and multipath setup. Newer hardware (x3550M2 + QLA2462 HBA) seems to be fine. I''ve tried several versions of pvops kernel and xen hypervisor from 4.x line with the same results. Roman On Tue, Jan 04, 2011 at 04:19:02PM +0100, Christophe Saout wrote:> Hi again, > > > > > > While doing LVM snapshot for migration and get the following: > > > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > > > > > It would be very good to track this down and get it fixed.. > > > hopefully you''re able to help a bit and try some things to debug it. > > > > > > Konrad maybe has some ideas to try.. > > > > I am seeing this with an lvcreate here, so I guess it''s somehow related > > to device-mapper stuff in general. > > > > It doesn''t look like this has been resolved yet. Somewhere I saw a > > request for the hypervisor message related to the pinning failure. > > > > Here it is: > > > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) > > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > > > I have a bit of experience in debugging things, so if I can help someone > > with more information... > > Additional information: This happened with a number of commands now. > However, I am running a multipath setup and every time the crash seemed > to be caused in the process context of the multipath daemon. I think > the daemon listens to events from the device-mapper subsystem to watch > for changes and the problem somehow arises from there, since on another > machine with the same XEN/Dom0 version without such a daemon I never had > any troubles with LVM. > > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > [<ffffffff8144ff65>] page_fault+0x25/0x30 > > Cheers, > Christophe > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- ---------------------------------------------------------------------- ,''''`. [benco] | mailto: benco@acid.sk | silc: /msg benco : :'' : ------------------------------------------------------------- `. `'' GPG publickey: http://www.acid.sk/pubkey.asc `- KF = 0DF6 0592 74D2 F17A DACF A5C3 1720 CB7C F54C F429 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2011-Jan-04 18:40 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hi once more,> > It doesn''t look like this has been resolved yet. Somewhere I saw a > > request for the hypervisor message related to the pinning failure. > > > > Here it is: > > > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) > > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > > > I have a bit of experience in debugging things, so if I can help someone > > with more information... > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > [<ffffffff8144ff65>] page_fault+0x25/0x30> Additional information: This happened with a number of commands now. > However, I am running a multipath setup and every time the crash > seemed to be caused in the process context of the multipath daemon. > I think the daemon listens to events from the device-mapper subsystem > to watch for changes and the problem somehow arises from there, since > on another machine with the same XEN/Dom0 version without such a > daemon I never had any troubles with LVM.On further investigation is seems that most of the time the issue is not caused by the daemon, but by the "multipath" tool, which is used a lot by udev to identify properties of block devices. When I start stracing udevd (following forks), I''m not able to reproduce the crash anymore. So I was hoping to find out what the process was doing before the crash occurs, but since my attempts to trace the process masks the bug, I can''t. :( (without strace, the bug is very common, about every third "lvcreate" command. Every lvcreate command triggers about 20 multipath invocations) Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-04 19:24 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Jan 4, 2011 at 9:48 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> On Sun, 2010-12-26 at 08:16 +0000, Teck Choon Giam wrote: > > > > Triggered BUG() in line 1860: > > > > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) > > { > > struct mmuext_op op; > > op.cmd = cmd; > > op.arg1.mfn = pfn_to_mfn(pfn); > > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) > > BUG(); <<THIS ONE? > > } > > A failure to pin/unpin is usually associated with a log message from the > hypervisor. Please can you attempt to capture the full host log, e.g. > using serial console. > > See http://wiki.xen.org/xenwiki/XenParavirtOps under "Are there more > debugging options I could enable to troubleshoot booting problems?" for > some details. >I will once I got the serial console cable and have a system to catch the log during my next visit to the DC. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-04 19:32 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@saout.de>wrote:> Hi once more, > > > > > It doesn''t look like this has been resolved yet. Somewhere I saw a > > > request for the hypervisor message related to the pinning failure. > > > > > > Here it is: > > > > > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp > 1000000000000000) for mfn 41114f (pfn d514f) > > > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > > > > > I have a bit of experience in debugging things, so if I can help > someone > > > with more information... > > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > > [<ffffffff8144ff65>] page_fault+0x25/0x30 > > > Additional information: This happened with a number of commands now. > > However, I am running a multipath setup and every time the crash > > seemed to be caused in the process context of the multipath daemon. > > I think the daemon listens to events from the device-mapper subsystem > > to watch for changes and the problem somehow arises from there, since > > on another machine with the same XEN/Dom0 version without such a > > daemon I never had any troubles with LVM. > > On further investigation is seems that most of the time the issue is not > caused by the daemon, but by the "multipath" tool, which is used a lot > by udev to identify properties of block devices. > > When I start stracing udevd (following forks), I''m not able to reproduce > the crash anymore. So I was hoping to find out what the process was > doing before the crash occurs, but since my attempts to trace the > process masks the bug, I can''t. :( > > (without strace, the bug is very common, about every third "lvcreate" > command. Every lvcreate command triggers about 20 multipath > invocations) > >I am able to prevent that bug for 8 days (till now) by implementing sleep 5 seconds then syc then sleep 5 seconds then sync repeating this for 60 seconds while doing lvm snapshot for 10 domUs. I mean: 1. lvm snapshot domU (lvcreate) 2. mount lvm snapsho domUt 3. rsync to backup domU 4. umount lvm snapshot domU 5. remove lvm snapshot domU (lvremove) 6. sync (start countdown of 60 seconds and every 5 seconds interval doing sync) 7. sleep 5 8. sync 9. sleep 5 10. sync 11. sleep 5 12. sync .... until it hits 0 second countdown Then next domU repeat the cycle. Doing the above I am able to prevent such crash or bug to pop up for 8 days (8 such daily LVM snapshot backup for all domUs) which I posted in this thread. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, I use the same workaround to start domUs, since the issue mentioned bellow ocured also during domU start via xendomains script. Another scenario causing this issue (at least for me) are cron scripts. I was unable to find out which one is responsible (crash every 2-3 days), but the issue went away with disabled cronscripts. Roman On Wed, Jan 05, 2011 at 03:32:58AM +0800, Teck Choon Giam wrote:> On Wed, Jan 5, 2011 at 2:40 AM, Christophe Saout <christophe@saout.de>wrote: > > > Hi once more, > > > > > > > > It doesn''t look like this has been resolved yet. Somewhere I saw a > > > > request for the hypervisor message related to the pinning failure. > > > > > > > > Here it is: > > > > > > > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp > > 1000000000000000) for mfn 41114f (pfn d514f) > > > > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > > > > > > > I have a bit of experience in debugging things, so if I can help > > someone > > > > with more information... > > > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > > > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > > > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > > > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > > > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > > > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > > > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > > > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > > > [<ffffffff8144ff65>] page_fault+0x25/0x30 > > > > > Additional information: This happened with a number of commands now. > > > However, I am running a multipath setup and every time the crash > > > seemed to be caused in the process context of the multipath daemon. > > > I think the daemon listens to events from the device-mapper subsystem > > > to watch for changes and the problem somehow arises from there, since > > > on another machine with the same XEN/Dom0 version without such a > > > daemon I never had any troubles with LVM. > > > > On further investigation is seems that most of the time the issue is not > > caused by the daemon, but by the "multipath" tool, which is used a lot > > by udev to identify properties of block devices. > > > > When I start stracing udevd (following forks), I''m not able to reproduce > > the crash anymore. So I was hoping to find out what the process was > > doing before the crash occurs, but since my attempts to trace the > > process masks the bug, I can''t. :( > > > > (without strace, the bug is very common, about every third "lvcreate" > > command. Every lvcreate command triggers about 20 multipath > > invocations) > > > > > I am able to prevent that bug for 8 days (till now) by implementing sleep 5 > seconds then syc then sleep 5 seconds then sync repeating this for 60 > seconds while doing lvm snapshot for 10 domUs. I mean: > > 1. lvm snapshot domU (lvcreate) > 2. mount lvm snapsho domUt > 3. rsync to backup domU > 4. umount lvm snapshot domU > 5. remove lvm snapshot domU (lvremove) > 6. sync (start countdown of 60 seconds and every 5 seconds interval doing > sync) > 7. sleep 5 > 8. sync > 9. sleep 5 > 10. sync > 11. sleep 5 > 12. sync > .... until it hits 0 second countdown > Then next domU repeat the cycle. > > Doing the above I am able to prevent such crash or bug to pop up for 8 days > (8 such daily LVM snapshot backup for all domUs) which I posted in this > thread. > > Thanks. > > Kindest regards, > Giam Teck Choon> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- ---------------------------------------------------------------------- ,''''`. [benco] | mailto: benco@acid.sk | silc: /msg benco : :'' : ------------------------------------------------------------- `. `'' GPG publickey: http://www.acid.sk/pubkey.asc `- KF = 0DF6 0592 74D2 F17A DACF A5C3 1720 CB7C F54C F429 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2011-Jan-04 23:10 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hi,> > > > While doing LVM snapshot for migration and get the following: > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > [...] > [<ffffffff810052e2>] pin_pagetable_pfn+0x52/0x60 > [<ffffffff81006f5c>] xen_alloc_ptpage+0x9c/0xa0 > [<ffffffff81006f8e>] xen_alloc_pte+0xe/0x10 > [<ffffffff810decde>] __pte_alloc+0x7e/0xf0 > [<ffffffff810e15c5>] handle_mm_fault+0x855/0x930 > [<ffffffff8102dd9e>] ? pvclock_clocksource_read+0x4e/0x100 > [<ffffffff810e734c>] ? do_mmap_pgoff+0x33c/0x380 > [<ffffffff81452b96>] do_page_fault+0x116/0x3e0 > [<ffffffff8144ff65>] page_fault+0x25/0x30 > [...] > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) > (XEN) mm.c:2733:d0 Error while pinning mfn 41114fLooking into the code, the Dom0 code ist attempting to pin what it thins is a "PGT_l1_page_table", however the hypervisor returns -EINVAL because it actually is a "PGT_writable_page". After a few hours I managed to catch the crash while the offending process is being straced. However the results where totally inconclusive, because the last lines before the crash are: 16576 open("/lib/multipath/libcheckdirectio.so", O_RDONLY) = 4 16576 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\v\0\0\0\0\0\0"..., 832) = 832 16576 fstat(4, {st_mode=S_IFREG|0644, st_size=9344, ...}) = 0 16576 mmap(NULL, 2104672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7fa6b36f6000 16576 mprotect(0x7fa6b36f8000, 2093056, PROT_NONE) = 0 16576 mmap(0x7fa6b38f7000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1000) = 0x7fa6b38f7000 16576 close(4) = 0 A non-crashing execution would have continued with: 16667 open("/etc/ld.so.cache", O_RDONLY) = 4 16667 fstat(4, {st_mode=S_IFREG|0644, st_size=21739, ...}) = 0 16667 mmap(NULL, 21739, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f237de56000 16667 close(4) = 0 16667 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 16667 open("/lib/libaio.so.1", O_RDONLY) = 4 [...] Which means that it crashed during the dynamic loading of a plugin shared library and not while interacting with the device mapper. (also, the device being investigated was /dev/sde and not some dm device) This leads me to believe that some device-mapper shared library has a particular memory layout that tends to trigger this crash and it has nothing to do with any device-mapper code at all. Also, the crash seems to be timing-sensitive, so it might also be a race condition of some sort. (on a side-note: this is a 24-core machine (!) and the kernel has happens to have full preemption enabled). I am trying to understand the code a bit. Can someone explain to me what xen_alloc_ptpage is doing.> /* This needs to make sure the new pte page is pinned iff its being > attached to a pinned pagetable. */ > [...] > if (PagePinned(virt_to_page(mm->pgd))) { > [...] > pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);I must admit I don''t know very much about memory handling in linux (so please excuse me if I am interpreting total nonsense into this here, still I''m intigued and would like to understand it a bit better), but isn''t `mm->pgd'' supposed to point to the L1 page table and `pfn'', being a pte page a 3rd/4th level page? Is this a code path that is exercised a lot? Thanks, Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Jan-05 10:51 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Jan 04, 2011 at 04:10:17PM +0100, Christophe Saout wrote:> Hello thread, > > hijacking this thread since I am running into the same issue on a new > machine. > > > > > While doing LVM snapshot for migration and get the following: > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here ]------------ > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: /sys/block/dm-26/dev > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: ipt_MASQUERADE > > > > It would be very good to track this down and get it fixed.. > > hopefully you''re able to help a bit and try some things to debug it. > > > > Konrad maybe has some ideas to try.. > > I am seeing this with an lvcreate here, so I guess it''s somehow related > to device-mapper stuff in general. >Sorry if this was already stated earlier.. what are the exact steps to reproduce? I could try reproducing it at some point.. -- Pasi> It doesn''t look like this has been resolved yet. Somewhere I saw a > request for the hypervisor message related to the pinning failure. > > Here it is: > > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 41114f (pfn d514f) > (XEN) mm.c:2733:d0 Error while pinning mfn 41114f > > I have a bit of experience in debugging things, so if I can help someone > with more information... > > Cheers, > Christophe > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-05 14:56 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Jan 5, 2011 at 6:51 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Tue, Jan 04, 2011 at 04:10:17PM +0100, Christophe Saout wrote: > > Hello thread, > > > > hijacking this thread since I am running into the same issue on a new > > machine. > > > > > > > While doing LVM snapshot for migration and get the following: > > > > > > > > > > Dec 26 15:58:29 xen01 kernel: ------------[ cut here > ]------------ > > > > > Dec 26 15:58:29 xen01 kernel: kernel BUG at > arch/x86/xen/mmu.c:1860! > > > > > Dec 26 15:58:29 xen01 kernel: invalid opcode: 0000 [#1] SMP > > > > > Dec 26 15:58:29 xen01 kernel: last sysfs file: > /sys/block/dm-26/dev > > > > > Dec 26 15:58:29 xen01 kernel: CPU 0 > > > > > Dec 26 15:58:29 xen01 kernel: Modules linked in: > ipt_MASQUERADE > > > > > > It would be very good to track this down and get it fixed.. > > > hopefully you''re able to help a bit and try some things to debug it. > > > > > > Konrad maybe has some ideas to try.. > > > > I am seeing this with an lvcreate here, so I guess it''s somehow related > > to device-mapper stuff in general. > > > > Sorry if this was already stated earlier.. > what are the exact steps to reproduce? I could try reproducing it at some > point.. > > -- Pasi >Did you try my posted script? Provided you have existing LV for domUs in your VG which can be easily created if not there. The idea is to create snapshot, mount it, umount it, remove snapshot and repeat this cycle in loop will catch this BUG!!! Here are the latest crash with serial console output as it doesn''t take long with sh test.sh loop 100 to produce this: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1f7e29 (pfn 25cb4) (XEN) mm.c:2733:d0 Error while pinning mfn 1f7e29 ------------[ cut here ]------------ kernel BUG at arch/x86/xen/mmu.c:1860! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:05:05.0/local_cpus CPU 0 Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] Pid: 6526, comm: dmsetup Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 RSP: e02b:ffff88001d8dfdc8 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: 0000000000025cb4 RCX: 000000000000012e RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef RBP: ffff88001d8dfde8 R08: 00000000000005a0 R09: ffff880000000000 R10: 00000000deadbeef R11: 0000003db6814e00 R12: 0000000000000003 R13: 0000000000025cb4 R14: ffff88002ffe8440 R15: 0000003db7616250 FS: 00007fb54068b710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003db7616250 CR3: 000000002ff88000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dmsetup (pid: 6526, threadinfo ffff88001d8de000, task ffff88002ffe8440) Stack: 0000000000000000 00000000001f7e29 000000013f009e18 0000000000025cb4 <0> ffff88001d8dfe08 ffffffff8100e07c ffff88001dc3d040 ffff88001dc6cdd8 <0> ffff88001d8dfe18 ffffffff8100e0af ffff88001d8dfe58 ffffffff810a402f Call Trace: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] __pte_alloc+0x70/0xce [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] page_fault+0x25/0x30 Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 8 RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 RSP <ffff88001d8dfdc8> ---[ end trace b0a2643219f652eb ]--- BUG: soft lockup - CPU#0 stuck for 61s! [dmsetup:6526] Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] CPU 0: Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] Pid: 6526, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 RIP: e030:[<ffffffff813199d3>] [<ffffffff813199d3>] _spin_lock+0x19/0x20 RSP: e02b:ffff88001d8dfa68 EFLAGS: 00000297 RAX: 0000000000000023 RBX: 0000000025f91000 RCX: 0000000000000004 RDX: 0000000000000022 RSI: 0000000000000004 RDI: ffff88001dc3d0c0 RBP: ffff88001d8dfa68 R08: 0000000000000000 R09: ffffffff816dd100 R10: ffff88003e7424c8 R11: 0000000000000020 R12: ffff88001dc3d040 R13: 0000000000000004 R14: ffff88001dc3d0a0 R15: ffffffff816dd100 FS: 00007fb54068b710(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003db7616250 CR3: 0000000001001000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb [<ffffffff8100dde1>] xen_exit_mmap+0x199/0x1d7 [<ffffffff810a8137>] exit_mmap+0x5f/0x14b [<ffffffff81048648>] mmput+0x46/0xb2 [<ffffffff8104c552>] exit_mm+0xfd/0x108 [<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7 [<ffffffff8104d7ee>] do_exit+0x1f3/0x67b [<ffffffff8131a908>] oops_end+0xba/0xc2 [<ffffffff810163a1>] die+0x55/0x5e [<ffffffff8131a192>] do_trap+0x110/0x11f [<ffffffff810142c8>] do_invalid_op+0x97/0xa0 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff810138bb>] invalid_op+0x1b/0x20 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59 [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] __pte_alloc+0x70/0xce [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] page_fault+0x25/0x30 Kernel panic - not syncing: softlockup: hung tasks Pid: 6526, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 Call Trace: <IRQ> [<ffffffff8104aa97>] panic+0xa0/0x15f [<ffffffff81319dd5>] ? page_fault+0x25/0x30 [<ffffffff8101640f>] ? show_trace_log_lvl+0x4c/0x58 [<ffffffff8101642b>] ? show_trace+0x10/0x12 [<ffffffff81011755>] ? show_regs+0x44/0x48 [<ffffffff8107f202>] softlockup_tick+0x173/0x182 [<ffffffff810539bf>] run_local_timers+0x18/0x1a [<ffffffff81053bde>] update_process_times+0x30/0x54 [<ffffffff81068821>] tick_sched_timer+0x70/0x99 [<ffffffff8105f52e>] __run_hrtimer+0x53/0xb3 [<ffffffff8105f772>] hrtimer_interrupt+0xae/0x192 [<ffffffff8100f3a3>] xen_timer_interrupt+0x37/0x181 [<ffffffff81082898>] ? check_for_new_grace_period+0x97/0xa5 [<ffffffff811c870f>] ? unmask_evtchn+0x34/0xd6 [<ffffffff8108318c>] ? __rcu_process_callbacks+0xf2/0x2ae [<ffffffff8107f708>] handle_IRQ_event+0x2d/0xb7 [<ffffffff81081079>] handle_percpu_irq+0x3c/0x69 [<ffffffff811c8640>] __xen_evtchn_do_upcall+0xe1/0x168 [<ffffffff811c92d1>] xen_evtchn_do_upcall+0x2e/0x41 [<ffffffff81013c7e>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff813199d3>] ? _spin_lock+0x19/0x20 [<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb [<ffffffff8100dde1>] ? xen_exit_mmap+0x199/0x1d7 [<ffffffff810a8137>] ? exit_mmap+0x5f/0x14b [<ffffffff81048648>] ? mmput+0x46/0xb2 [<ffffffff8104c552>] ? exit_mm+0xfd/0x108 [<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7 [<ffffffff8104d7ee>] ? do_exit+0x1f3/0x67b [<ffffffff8131a908>] ? oops_end+0xba/0xc2 [<ffffffff810163a1>] ? die+0x55/0x5e [<ffffffff8131a192>] ? do_trap+0x110/0x11f [<ffffffff810142c8>] ? do_invalid_op+0x97/0xa0 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff810138bb>] ? invalid_op+0x1b/0x20 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59 [<ffffffff8100e07c>] ? xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] ? xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] ? __pte_alloc+0x70/0xce [<ffffffff810a41cd>] ? handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] ? do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] ? page_fault+0x25/0x30 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-05 15:30 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Jan 5, 2011 at 3:24 AM, Teck Choon Giam <giamteckchoon@gmail.com>wrote:> > > On Tue, Jan 4, 2011 at 9:48 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote: > >> On Sun, 2010-12-26 at 08:16 +0000, Teck Choon Giam wrote: >> > >> > Triggered BUG() in line 1860: >> > >> > static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn) >> > { >> > struct mmuext_op op; >> > op.cmd = cmd; >> > op.arg1.mfn = pfn_to_mfn(pfn); >> > if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) >> > BUG(); <<THIS ONE? >> > } >> >> A failure to pin/unpin is usually associated with a log message from the >> hypervisor. Please can you attempt to capture the full host log, e.g. >> using serial console. >> >> See http://wiki.xen.org/xenwiki/XenParavirtOps under "Are there more >> debugging options I could enable to troubleshoot booting problems?" for >> some details. >> > > I will once I got the serial console cable and have a system to catch the > log during my next visit to the DC. > >Hi Ian, Here is another console output besides the other one which I posted: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended INIT: Id "s1" respawning too fast: disabled for 5 minutes EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended hrtimer: interrupt took 3096797 ns EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended EXT3-fs warning: maximal mount count reached, running e2fsck is recommended (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1ee744 (pfn 1c399) (XEN) mm.c:2733:d0 Error while pinning mfn 1ee744 ------------[ cut here ]------------ kernel BUG at arch/x86/xen/mmu.c:1860! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/block/dm-17/dev CPU 1 Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] Pid: 19758, comm: dmsetup Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 RSP: e02b:ffff88003a615dc8 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: 000000000001c399 RCX: 00000000000000e1 RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef RBP: ffff88003a615de8 R08: 0000000000000cc8 R09: ffff880000000000 R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 R13: 000000000001c399 R14: ffff88001c1d86c0 R15: 0000003db7400258 FS: 00007f6cbd5b6710(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003db7400258 CR3: 000000003a5ed000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dmsetup (pid: 19758, threadinfo ffff88003a614000, task ffff88001c1d86c0) Stack: 0000000000000000 00000000001ee744 000000013e64d518 000000000001c399 <0> ffff88003a615e08 ffffffff8100e07c ffff880027a2c580 ffff88003a56fdd0 <0> ffff88003a615e18 ffffffff8100e0af ffff88003a615e58 ffffffff810a402f Call Trace: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] __pte_alloc+0x70/0xce [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] page_fault+0x25/0x30 Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 8 RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 RSP <ffff88003a615dc8> ---[ end trace 63676fea977b3461 ]--- BUG: soft lockup - CPU#1 stuck for 61s! [dmsetup:19758] Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] CPU 1: Modules linked in: ext4 jbd2 crc16 gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_] Pid: 19758, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 RIP: e030:[<ffffffff813199d3>] [<ffffffff813199d3>] _spin_lock+0x19/0x20 RSP: e02b:ffff88003a615a68 EFLAGS: 00000297 RAX: 0000000000000025 RBX: 000000003d2fd000 RCX: 0000000000000004 RDX: 0000000000000024 RSI: 0000000000000004 RDI: ffff880027a2c600 RBP: ffff88003a615a68 R08: 0000000000000000 R09: ffffffff816dd100 R10: 3030303030303030 R11: 0000000000000120 R12: ffff880027a2c580 R13: 0000000000000004 R14: ffff880027a2c5e0 R15: ffffffff816dd100 FS: 00007f720a0f36e0(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f7209c9c898 CR3: 0000000001001000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb [<ffffffff8100dde1>] xen_exit_mmap+0x199/0x1d7 [<ffffffff810a8137>] exit_mmap+0x5f/0x14b [<ffffffff81048648>] mmput+0x46/0xb2 [<ffffffff8104c552>] exit_mm+0xfd/0x108 [<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7 [<ffffffff8104d7ee>] do_exit+0x1f3/0x67b [<ffffffff8131a908>] oops_end+0xba/0xc2 [<ffffffff810163a1>] die+0x55/0x5e [<ffffffff8131a192>] do_trap+0x110/0x11f [<ffffffff810142c8>] do_invalid_op+0x97/0xa0 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff810138bb>] invalid_op+0x1b/0x20 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59 [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] __pte_alloc+0x70/0xce [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] page_fault+0x25/0x30 Kernel panic - not syncing: softlockup: hung tasks Pid: 19758, comm: dmsetup Tainted: G D 2.6.32.27-0.xen.pvops.choon.centos5 #1 Call Trace: <IRQ> [<ffffffff8104aa97>] panic+0xa0/0x15f [<ffffffff81319dd5>] ? page_fault+0x25/0x30 [<ffffffff8101640f>] ? show_trace_log_lvl+0x4c/0x58 [<ffffffff8101642b>] ? show_trace+0x10/0x12 [<ffffffff81011755>] ? show_regs+0x44/0x48 [<ffffffff8107f202>] softlockup_tick+0x173/0x182 [<ffffffff810539bf>] run_local_timers+0x18/0x1a [<ffffffff81053bde>] update_process_times+0x30/0x54 [<ffffffff81068821>] tick_sched_timer+0x70/0x99 [<ffffffff8105f52e>] __run_hrtimer+0x53/0xb3 [<ffffffff8105f772>] hrtimer_interrupt+0xae/0x192 [<ffffffff8100f3a3>] xen_timer_interrupt+0x37/0x181 [<ffffffff81082898>] ? check_for_new_grace_period+0x97/0xa5 [<ffffffff811c870f>] ? unmask_evtchn+0x34/0xd6 [<ffffffff8108318c>] ? __rcu_process_callbacks+0xf2/0x2ae [<ffffffff8107f708>] handle_IRQ_event+0x2d/0xb7 [<ffffffff81081079>] handle_percpu_irq+0x3c/0x69 [<ffffffff811c8640>] __xen_evtchn_do_upcall+0xe1/0x168 [<ffffffff811c92d1>] xen_evtchn_do_upcall+0x2e/0x41 [<ffffffff81013c7e>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff813199d3>] ? _spin_lock+0x19/0x20 [<ffffffff811660dd>] ? free_cpumask_var+0x9/0xb [<ffffffff8100dde1>] ? xen_exit_mmap+0x199/0x1d7 [<ffffffff810a8137>] ? exit_mmap+0x5f/0x14b [<ffffffff81048648>] ? mmput+0x46/0xb2 [<ffffffff8104c552>] ? exit_mm+0xfd/0x108 [<ffffffff8100f799>] ? xen_irq_enable_direct_end+0x0/0x7 [<ffffffff8104d7ee>] ? do_exit+0x1f3/0x67b [<ffffffff8131a908>] ? oops_end+0xba/0xc2 [<ffffffff810163a1>] ? die+0x55/0x5e [<ffffffff8131a192>] ? do_trap+0x110/0x11f [<ffffffff810142c8>] ? do_invalid_op+0x97/0xa0 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff810138bb>] ? invalid_op+0x1b/0x20 [<ffffffff8100cb5b>] ? pin_pagetable_pfn+0x53/0x59 [<ffffffff8100cb57>] ? pin_pagetable_pfn+0x4f/0x59 [<ffffffff8100e07c>] ? xen_alloc_ptpage+0x64/0x69 [<ffffffff8100e0af>] ? xen_alloc_pte+0xe/0x10 [<ffffffff810a402f>] ? __pte_alloc+0x70/0xce [<ffffffff810a41cd>] ? handle_mm_fault+0x140/0x8b9 [<ffffffff8131be4d>] ? do_page_fault+0x252/0x2e2 [<ffffffff81319dd5>] ? page_fault+0x25/0x30 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Same issue: Backup DomU with lvsnapshot ... Linux xencave 2.6.32-5-xen-amd64 #1 SMP Fri Dec 10 17:41:50 UTC 2010 x86_64 GNU/Linux Jan 13 04:01:51 xencave kernel: [223872.758977] ------------[ cut here ]------------ Jan 13 04:01:51 xencave kernel: [223872.759006] kernel BUG at /build/buildd-linux-2.6_2.6.32-29-amd64-xcs37n/linux-2.6-2.6.32/debian/build/source_amd64_xen/arch/x86/xen/mmu.c:1649! Jan 13 04:01:51 xencave kernel: [223872.759047] invalid opcode: 0000 [#2] SMP Jan 13 04:01:51 xencave kernel: [223872.759076] last sysfs file: /sys/devices/virtual/block/dm-30/dm/suspended Jan 13 04:01:51 xencave kernel: [223872.759099] CPU 1 Jan 13 04:01:51 xencave kernel: [223872.759121] Modules linked in: dm_snapshot nfs lockd fscache nfs_acl auth_rpcgss sunrpc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs bridge stp coretemp it87 hwmon_vid fuse loop i915 drm_kms_helper processor acpi_processor drm i2c_algo_bit button video snd_pcsp rng_core evdev i2c_i801 serio_raw snd_pcm snd_timer output i2c_core snd soundcore snd_page_alloc usbhid hid ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod cdrom sd_mod crc_t10dif ata_generic ata_piix libata scsi_mod ehci_hcd uhci_hcd r8169 mii usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan] Jan 13 04:01:51 xencave kernel: [223872.759599] Pid: 9244, comm: dmsetup_env Tainted: G D 2.6.32-5-xen-amd64 #1 945GM-S2 Jan 13 04:01:51 xencave kernel: [223872.759636] RIP: e030:[<ffffffff8100c694>] [<ffffffff8100c694>] pin_pagetable_pfn+0x2d/0x36 Jan 13 04:01:51 xencave kernel: [223872.759683] RSP: e02b:ffff88001d695e08 EFLAGS: 00010282 Jan 13 04:01:51 xencave kernel: [223872.759707] RAX: 00000000ffffffea RBX: 000000000001e9df RCX: 0000000000000001 Jan 13 04:01:51 xencave kernel: [223872.759742] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001d695e08 Jan 13 04:01:51 xencave kernel: [223872.759777] RBP: ffff88001f2b4700 R08: 0000000000000ef8 R09: ffffea00006b28c8 Jan 13 04:01:51 xencave kernel: [223872.759812] R10: 0000000000007ff0 R11: ffffea00005858c8 R12: ffff88001a3ee010 Jan 13 04:01:51 xencave kernel: [223872.759846] R13: ffff8800020a8000 R14: ffff880002089530 R15: ffff88001a3ee010 Jan 13 04:01:51 xencave kernel: [223872.759887] FS: 00007f81d5709700(0000) GS:ffff88000308c000(0000) knlGS:0000000000000000 Jan 13 04:01:51 xencave kernel: [223872.759925] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 13 04:01:51 xencave kernel: [223872.759948] CR2: 000000000043c56a CR3: 000000001a38c000 CR4: 0000000000002660 Jan 13 04:01:51 xencave kernel: [223872.759985] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 13 04:01:51 xencave kernel: [223872.760022] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 13 04:01:51 xencave kernel: [223872.760057] Process dmsetup_env (pid: 9244, threadinfo ffff88001d694000, task ffff880002089530) Jan 13 04:01:51 xencave kernel: [223872.760094] Stack: Jan 13 04:01:51 xencave kernel: [223872.760112] 0000000000000000 000000000009e90d ffffea00006b28c8 000000000001e9df Jan 13 04:01:51 xencave kernel: [223872.760149] <0> ffff88001f2b4700 ffffffff810cd4a1 ffff88001a3dc000 000000000043c56a Jan 13 04:01:51 xencave kernel: [223872.760203] <0> 000000001a3dc000 ffffffff810cb34c ffff8800020a8000 000000000043c56a Jan 13 04:01:51 xencave kernel: [223872.760270] Call Trace: Jan 13 04:01:51 xencave kernel: [223872.760295] [<ffffffff810cd4a1>] ? __pte_alloc+0x6b/0xc6 Jan 13 04:01:51 xencave kernel: [223872.760320] [<ffffffff810cb34c>] ? pmd_alloc+0x28/0x5b Jan 13 04:01:51 xencave kernel: [223872.760344] [<ffffffff810cd5ca>] ? handle_mm_fault+0xce/0x80f Jan 13 04:01:51 xencave kernel: [223872.760372] [<ffffffff8130cc35>] ? page_fault+0x25/0x30 Jan 13 04:01:51 xencave kernel: [223872.760397] [<ffffffff8130ed96>] ? do_page_fault+0x2e0/0x2fc Jan 13 04:01:51 xencave kernel: [223872.760422] [<ffffffff8130cc35>] ? page_fault+0x25/0x30 Jan 13 04:01:51 xencave kernel: [223872.760443] Code: ec 28 89 3c 24 48 89 f7 e8 a2 fd ff ff 48 89 e7 48 89 44 24 08 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 b0 cc ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 55 49 89 ca 48 89 d5 40 88 f1 48 89 Jan 13 04:01:51 xencave kernel: [223872.760761] RIP [<ffffffff8100c694>] pin_pagetable_pfn+0x2d/0x36 Jan 13 04:01:51 xencave kernel: [223872.760790] RSP <ffff88001d695e08> Jan 13 04:01:51 xencave kernel: [223872.761008] ---[ end trace b143b5bdb343412e ]--- help! -- View this message in context: http://xen.1045712.n5.nabble.com/kernel-BUG-at-arch-x86-xen-mmu-c-1860-tp3318567p3339844.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-14 14:47 UTC
Re: [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
On Thu, Jan 13, 2011 at 06:28:22AM -0800, tjaouen wrote:> > Same issue: > > Backup DomU with lvsnapshot ...Ok. I just got some free time so let me take a look at this.> > Linux xencave 2.6.32-5-xen-amd64 #1 SMP Fri Dec 10 17:41:50 UTC 2010 x86_64 > GNU/Linux > > Jan 13 04:01:51 xencave kernel: [223872.758977] ------------[ cut here > ]------------ > Jan 13 04:01:51 xencave kernel: [223872.759006] kernel BUG at > /build/buildd-linux-2.6_2.6.32-29-amd64-xcs37n/linux-2.6-2.6.32/debian/build/source_amd64_xen/arch/x86/xen/mmu.c:1649! > Jan 13 04:01:51 xencave kernel: [223872.759047] invalid opcode: 0000 [#2] > SMP > Jan 13 04:01:51 xencave kernel: [223872.759076] last sysfs file: > /sys/devices/virtual/block/dm-30/dm/suspended > Jan 13 04:01:51 xencave kernel: [223872.759099] CPU 1 > Jan 13 04:01:51 xencave kernel: [223872.759121] Modules linked in: > dm_snapshot nfs lockd fscache nfs_acl auth_rpcgss sunrpc nf_conntrack_ipv4 > nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables > x_tables xen_evtchn xenfs bridge stp coretemp it87 hwmon_vid fuse loop i915 > drm_kms_helper processor acpi_processor drm i2c_algo_bit button video > snd_pcsp rng_core evdev i2c_i801 serio_raw snd_pcm snd_timer output i2c_core > snd soundcore snd_page_alloc usbhid hid ext3 jbd mbcache dm_mod raid1 md_mod > sg sr_mod cdrom sd_mod crc_t10dif ata_generic ata_piix libata scsi_mod > ehci_hcd uhci_hcd r8169 mii usbcore nls_base thermal fan thermal_sys [last > unloaded: scsi_wait_scan] > Jan 13 04:01:51 xencave kernel: [223872.759599] Pid: 9244, comm: dmsetup_env > Tainted: G D 2.6.32-5-xen-amd64 #1 945GM-S2 > Jan 13 04:01:51 xencave kernel: [223872.759636] RIP: > e030:[<ffffffff8100c694>] [<ffffffff8100c694>] pin_pagetable_pfn+0x2d/0x36 > Jan 13 04:01:51 xencave kernel: [223872.759683] RSP: e02b:ffff88001d695e08 > EFLAGS: 00010282 > Jan 13 04:01:51 xencave kernel: [223872.759707] RAX: 00000000ffffffea RBX: > 000000000001e9df RCX: 0000000000000001 > Jan 13 04:01:51 xencave kernel: [223872.759742] RDX: 0000000000000000 RSI: > 0000000000000001 RDI: ffff88001d695e08 > Jan 13 04:01:51 xencave kernel: [223872.759777] RBP: ffff88001f2b4700 R08: > 0000000000000ef8 R09: ffffea00006b28c8 > Jan 13 04:01:51 xencave kernel: [223872.759812] R10: 0000000000007ff0 R11: > ffffea00005858c8 R12: ffff88001a3ee010 > Jan 13 04:01:51 xencave kernel: [223872.759846] R13: ffff8800020a8000 R14: > ffff880002089530 R15: ffff88001a3ee010 > Jan 13 04:01:51 xencave kernel: [223872.759887] FS: 00007f81d5709700(0000) > GS:ffff88000308c000(0000) knlGS:0000000000000000 > Jan 13 04:01:51 xencave kernel: [223872.759925] CS: e033 DS: 0000 ES: 0000 > CR0: 000000008005003b > Jan 13 04:01:51 xencave kernel: [223872.759948] CR2: 000000000043c56a CR3: > 000000001a38c000 CR4: 0000000000002660 > Jan 13 04:01:51 xencave kernel: [223872.759985] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Jan 13 04:01:51 xencave kernel: [223872.760022] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Jan 13 04:01:51 xencave kernel: [223872.760057] Process dmsetup_env (pid: > 9244, threadinfo ffff88001d694000, task ffff880002089530) > Jan 13 04:01:51 xencave kernel: [223872.760094] Stack: > Jan 13 04:01:51 xencave kernel: [223872.760112] 0000000000000000 > 000000000009e90d ffffea00006b28c8 000000000001e9df > Jan 13 04:01:51 xencave kernel: [223872.760149] <0> ffff88001f2b4700 > ffffffff810cd4a1 ffff88001a3dc000 000000000043c56a > Jan 13 04:01:51 xencave kernel: [223872.760203] <0> 000000001a3dc000 > ffffffff810cb34c ffff8800020a8000 000000000043c56a > Jan 13 04:01:51 xencave kernel: [223872.760270] Call Trace: > Jan 13 04:01:51 xencave kernel: [223872.760295] [<ffffffff810cd4a1>] ? > __pte_alloc+0x6b/0xc6 > Jan 13 04:01:51 xencave kernel: [223872.760320] [<ffffffff810cb34c>] ? > pmd_alloc+0x28/0x5b > Jan 13 04:01:51 xencave kernel: [223872.760344] [<ffffffff810cd5ca>] ? > handle_mm_fault+0xce/0x80f > Jan 13 04:01:51 xencave kernel: [223872.760372] [<ffffffff8130cc35>] ? > page_fault+0x25/0x30 > Jan 13 04:01:51 xencave kernel: [223872.760397] [<ffffffff8130ed96>] ? > do_page_fault+0x2e0/0x2fc > Jan 13 04:01:51 xencave kernel: [223872.760422] [<ffffffff8130cc35>] ? > page_fault+0x25/0x30 > Jan 13 04:01:51 xencave kernel: [223872.760443] Code: ec 28 89 3c 24 48 89 > f7 e8 a2 fd ff ff 48 89 e7 48 89 44 24 08 be 01 00 00 00 31 d2 41 ba f0 7f > 00 00 e8 b0 cc ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 55 49 89 ca 48 > 89 d5 40 88 f1 48 89 > Jan 13 04:01:51 xencave kernel: [223872.760761] RIP [<ffffffff8100c694>] > pin_pagetable_pfn+0x2d/0x36 > Jan 13 04:01:51 xencave kernel: [223872.760790] RSP <ffff88001d695e08> > Jan 13 04:01:51 xencave kernel: [223872.761008] ---[ end trace > b143b5bdb343412e ]--- > > > help! > -- > View this message in context: http://xen.1045712.n5.nabble.com/kernel-BUG-at-arch-x86-xen-mmu-c-1860-tp3318567p3339844.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-14 15:20 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Dec 29, 2010 at 12:58:15PM +0800, Teck Choon Giam wrote:> Below is my latest test crash script:This test script is still valid? Or do you have a more updated one?> > ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- > #!/bin/sh > # > # This script is to create lvm snapshot, mount it, umount it and remove in a > # specified number of loops to test whether it will crash the host server. > # All LVM snapshots assumed can be mounted like if you are running a PV > domU. > # > # Created by Giam Teck Choon > # > > # The LV name and for this case we are using XenGroup > LVGroupName=XenGroup > > # return 1 if is mounted otherwise return 0 > check_mount() { > local checkdir=${1} > if [ -n "$checkdir" ] ; then > local check=`grep "$checkdir" /proc/mounts` > if [ -n "$check" ] ; then > return 1 > fi > fi > return 0 > } > > do_lvm_create_remove() { > # number of loops default is 1 > local loopcountlimit=${1:-1} > # snapshot size default is 1G > local snapshotsize=${2:-1G} > # implement a sleep between create, mount, umount and remove (default is > 0 which is no pause) > local pauseinterval=${3:-0} > # execute commands after each pause/sleep such as sync or anything that > you want to test > local commands=${4} > # We filter out snapshot and swap > local count=0 > if [ -d "/dev/${LVGroupName}" ] ; then > while [ "$count" -lt "$loopcountlimit" ] > do > count=`expr $count + 1` > echo "${count} ... ... " > for i in `ls /dev/${LVGroupName} | grep -Ev ''snapshot$'' | grep > -Ev ''swap$''`; do > if [ -h "/dev/${LVGroupName}/${i}" ] ; then > echo -n "lvcreate -s -v -n ${i}-snapshot -L > ${snapshotsize} /dev/${LVGroupName}/${i} ... ... " > lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} > /dev/${LVGroupName}/${i} > echo "done." > sleep ${pauseinterval} > if [ -n "$commands" ] ; then > echo -n "${commands} ... ... " > $commands > echo "done." > fi > mkdir -p /mnt/testlvm/${i} > if [ -h "/dev/${LVGroupName}/${i}-snapshot" ] ; then > check_mount /mnt/testlvm/${i} > local ismount=$? > if [ "$ismount" -eq 0 ] ; then > echo -n "mount /dev/${LVGroupName}/${i}-snapshot > /mnt/testlvm/${i} ... ... " > mount /dev/${LVGroupName}/${i}-snapshot > /mnt/testlvm/${i} > echo "done." > sleep ${pauseinterval} > if [ -n "$commands" ] ; then > echo -n "${commands} ... ... " > $commands > echo "done." > fi > fi > check_mount /mnt/testlvm/${i} > local ismount2=$? > if [ "$ismount2" -eq 1 ] ; then > echo -n "umount /mnt/testlvm/${i} ... ... " > umount /mnt/testlvm/${i} > echo "done." > sleep ${pauseinterval} > if [ -n "$commands" ] ; then > echo -n "${commands} ... ... " > $commands > echo "done." > fi > fi > fi > rm -rf /mnt/testlvm/${i} > echo -n "lvremove -f /dev/${LVGroupName}/${i}-snapshot > ... ... " > lvremove -f /dev/${LVGroupName}/${i}-snapshot > echo "done." > sleep ${pauseinterval} > if [ -n "$commands" ] ; then > echo -n "${commands} ... ... " > $commands > echo "done." > fi > fi > done > rm -fr /mnt/testlvm > done > else > echo "/dev/${LVGroupName} directory not found!" > exit 1 > fi > } > > case $1 in > loop) shift > do_lvm_create_remove "$@" > ;; > *) cat <<HELP > Usage: $0 loop loopcountlimit snapshotsize pauseinterval commands > Where: > loopcountlimit is default to 1 > snapshotsize is default to 1G > pauseinterval is default to 0 > commands is default to none > > Example to run with 100 loops without pause/sleep: > $0 loop 100 > > Example to run with 100 loops with pause/sleep of 5 seconds: > $0 loop 100 1G 5 > > Example to run with 100 loops with snapshot size of 2G instead of 1G: > $0 loop 100 2G > > Example to run with 50 loops, 1G snapshot size, 5 seconds pause and with > sync: > command with each pause/sleep > $0 loop 50 1G 5 sync > > Example to run with 50 loops, 1G snapshot size, no pause and with sync: > command with each pause/sleep > $0 loop 50 1G 0 sync > > Example to run your own commands: > $0 loop 100 1G 5 "echo hi && sync" > > HELP > ;; > esac > ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- > > Thanks. > > Kindest regards, > Giam Teck Choon> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-14 15:22 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> (without strace, the bug is very common, about every third "lvcreate" > command. Every lvcreate command triggers about 20 multipathSomething must be busted with your udev rules for this to go in effect. You should have multipathd process those and not run multipath for every device mapper call. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-14 15:24 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> The idea is to create snapshot, mount it, umount it, remove snapshot and > repeat this cycle in loop will catch this BUG!!!What is the hypervisor version? Is it xen-unstable.hg? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2011-Jan-14 15:33 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Hi Konrad,> > (without strace, the bug is very common, about every third "lvcreate" > > command. Every lvcreate command triggers about 20 multipath > > Something must be busted with your udev rules for this to go in effect. > You should have multipathd process those and not run multipath for > every device mapper call.This is a standard Debian squeeze. I think udev is just calling those to figure out if a device should be handled by multipath or not (part of the device identification). I''d rather not fiddle with it, as it "works" even if Debian is shipping suboptimal rules. Anyhow, changing this would simply reduce the risk of running into the kernel issue, not remove it entirely, so I prefer the actual fix. :) Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-14 19:25 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Jan 14, 2011 at 11:20 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Wed, Dec 29, 2010 at 12:58:15PM +0800, Teck Choon Giam wrote: > > Below is my latest test crash script: > > This test script is still valid? Or do you have a more updated one? >Below is the more updated one. Please note that this will mostly not support VG in CLVM since clustered LVM doesn''t support snapshot in my CentOS 5 testing... ... ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- #!/bin/sh # # This script is to create lvm snapshot, mount it, umount it and remove in a # specified number of loops to test whether it will crash the host server. # All LVM snapshots assumed can be mounted like if you are running a PV domU. # # Created by Giam Teck Choon # # The LV name and for this case we are using the first in vgdisplay output. # Change the variable if you want other VG Name or change head -n 1 to # tail -n 1 if you prefer to use last VG instead of first if you happen to have # more than one VG LVGroupName=`vgdisplay | grep ''VG Name'' | awk ''{print $3}'' | head -n 1` if [ ! -n "$LVGroupName" ] && [ ! -d "/dev/${LVGroupName}" ] ; then echo "Unable to detect VG Name!" exit 1 fi # return 1 if is mounted otherwise return 0 check_mount() { local checkdir=${1} if [ -n "$checkdir" ] ; then local check=`grep "$checkdir" /proc/mounts` if [ -n "$check" ] ; then return 1 fi fi return 0 } # We will create 5 testcrash LV in $LVGroupName each with 5GB size # and format it as ext3 do_lvm_create_testcrash() { local lvname=${1:-testcrash} local lvsize=${2:-5G} local count=1 local limit=5 while [ "$count" -le "$limit" ] do if [ ! -h "/dev/${LVGroupName}/${lvname}${count}" ] ; then echo "lvcreate -v -n ${lvname}${count} -L ${lvsize} ${LVGroupName} ... ... " lvcreate -v -n ${lvname}${count} -L ${lvsize} ${LVGroupName} echo "lvcreate -v -n ${lvname}${count} -L ${lvsize} ${LVGroupName} completed!" if [ -h "/dev/${LVGroupName}/${lvname}${count}" ] ; then echo "mke2fs -F -j /dev/${LVGroupName}/${lvname}${count} ... ... " mke2fs -F -j /dev/${LVGroupName}/${lvname}${count} echo "mke2fs -F -j /dev/${LVGroupName}/${lvname}${count} completed!" else echo "/dev/${LVGroupName}/${lvname}${count} not found!" fi fi count=`expr $count + 1` done } do_lvm_create_remove() { # number of loops default is 1 local loopcountlimit=${1:-1} # snapshot size default is 1G local snapshotsize=${2:-1G} # implement a sleep between create, mount, umount and remove (default is 0 which is no pause) local pauseinterval=${3:-0} # execute commands after each pause/sleep such as sync or anything that you want to test local commands=${4} # We filter out snapshot and swap local count=0 if [ -d "/dev/${LVGroupName}" ] ; then while [ "$count" -lt "$loopcountlimit" ] do count=`expr $count + 1` echo "${count} ... ... " for i in `ls /dev/${LVGroupName} | grep -Ev ''snapshot$'' | grep -Ev ''swap$''`; do if [ -h "/dev/${LVGroupName}/${i}" ] ; then echo -n "lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} ... ... " lvcreate -s -v -n ${i}-snapshot -L ${snapshotsize} /dev/${LVGroupName}/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi mkdir -p /mnt/testlvm/${i} if [ -h "/dev/${LVGroupName}/${i}-snapshot" ] ; then check_mount /mnt/testlvm/${i} local ismount=$? if [ "$ismount" -eq 0 ] ; then echo -n "mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} ... ... " mount /dev/${LVGroupName}/${i}-snapshot /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi check_mount /mnt/testlvm/${i} local ismount2=$? if [ "$ismount2" -eq 1 ] ; then echo -n "umount /mnt/testlvm/${i} ... ... " umount /mnt/testlvm/${i} echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi fi rm -rf /mnt/testlvm/${i} echo -n "lvremove -f /dev/${LVGroupName}/${i}-snapshot ... ... " lvremove -f /dev/${LVGroupName}/${i}-snapshot echo "done." sleep ${pauseinterval} if [ -n "$commands" ] ; then echo -n "${commands} ... ... " $commands echo "done." fi fi done rm -fr /mnt/testlvm done else echo "/dev/${LVGroupName} directory not found!" exit 1 fi } case $1 in setup) shift do_lvm_create_testcrash "$@" ;; loop) shift do_lvm_create_remove "$@" ;; *) cat <<HELP Usage: $0 loop loopcountlimit snapshotsize pauseinterval commands Where: loopcountlimit is default to 1 snapshotsize is default to 1G pauseinterval is default to 0 commands is default to none Example to run with 100 loops without pause/sleep: $0 loop 100 Example to run with 100 loops with pause/sleep of 5 seconds: $0 loop 100 1G 5 Example to run with 100 loops with snapshot size of 2G instead of 1G: $0 loop 100 2G Example to run with 50 loops, 1G snapshot size, 5 seconds pause and with sync: command with each pause/sleep $0 loop 50 1G 5 sync Example to run your own commands: $0 loop 100 1G 5 "echo hi && sync" If this is the first time you are running and do not have any LV in your VG, run: $0 setup This will create 5 testcrash LV in your VG with 5GB size each (default) and format to ext3. HELP ;; esac ----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<---------- Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-14 19:31 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Jan 14, 2011 at 11:24 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> > The idea is to create snapshot, mount it, umount it, remove snapshot and > > repeat this cycle in loop will catch this BUG!!! > > What is the hypervisor version? Is it xen-unstable.hg? >xen-4.0-testing changeset 214xx even with 21436 PVOPS latest stable/2.6.32.x Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-14 19:44 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sat, Jan 15, 2011 at 03:25:55AM +0800, Teck Choon Giam wrote:> On Fri, Jan 14, 2011 at 11:20 PM, Konrad Rzeszutek Wilk < > konrad.wilk@oracle.com> wrote: > > > On Wed, Dec 29, 2010 at 12:58:15PM +0800, Teck Choon Giam wrote: > > > Below is my latest test crash script: > > > > This test script is still valid? Or do you have a more updated one? > > > > Below is the more updated one. Please note that this will mostly notCan you send it as attachment? The copy-n-paste broke it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-14 20:09 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sat, Jan 15, 2011 at 3:44 AM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Sat, Jan 15, 2011 at 03:25:55AM +0800, Teck Choon Giam wrote: > > On Fri, Jan 14, 2011 at 11:20 PM, Konrad Rzeszutek Wilk < > > konrad.wilk@oracle.com> wrote: > > > > > On Wed, Dec 29, 2010 at 12:58:15PM +0800, Teck Choon Giam wrote: > > > > Below is my latest test crash script: > > > > > > This test script is still valid? Or do you have a more updated one? > > > > > > > Below is the more updated one. Please note that this will mostly not > > Can you send it as attachment? The copy-n-paste broke it. >Sure ;) Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-14 20:32 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
My latest test crash with a little different result: Jan 15 04:27:16 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Normally is related to LVM/DM but now it is related to shared_cpu_map cache? Jan 15 04:27:16 xen06 kernel: ------------[ cut here ]------------ Jan 15 04:27:16 xen06 kernel: kernel BUG at arch/x86/xen/mmu.c:1860! Jan 15 04:27:16 xen06 kernel: invalid opcode: 0000 [#1] SMP Jan 15 04:27:16 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map << SEE THIS? Jan 15 04:27:16 xen06 kernel: CPU 1 Jan 15 04:27:16 xen06 kernel: Modules linked in: dm_snapshot ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg tg3 libphy ide_cd_mod cdrom serio_raw button tpm_tis tpm tpm_bios i2c_i801 pcspkr i2c_core shpchp iTCO_wdt dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Jan 15 04:27:16 xen06 kernel: Pid: 26392, comm: mpath_wait Not tainted 2.6.32.27-0.xen.pvops.choon.centos5 #1 PowerEdge 860 Jan 15 04:27:16 xen06 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Jan 15 04:27:16 xen06 kernel: RSP: e02b:ffff88001aae5dc8 EFLAGS: 00010282 Jan 15 04:27:16 xen06 kernel: RAX: 00000000ffffffea RBX: 000000000003cf04 RCX: 00000000000001e7 Jan 15 04:27:16 xen06 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Jan 15 04:27:16 xen06 kernel: RBP: ffff88001aae5de8 R08: 0000000000000820 R09: ffff880000000000 Jan 15 04:27:16 xen06 kernel: R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 Jan 15 04:27:16 xen06 kernel: R13: 000000000003cf04 R14: ffff88001aa6a7c0 R15: 00000033e1a9a4d5 Jan 15 04:27:16 xen06 kernel: FS: 00007f9b8dbdf6e0(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 Jan 15 04:27:16 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 15 04:27:16 xen06 kernel: CR2: 00000033e1a9a4d5 CR3: 000000001ab61000 CR4: 0000000000002660 Jan 15 04:27:16 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 15 04:27:16 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 15 04:27:16 xen06 kernel: Process mpath_wait (pid: 26392, threadinfo ffff88001aae4000, task ffff88001aa6a7c0) Jan 15 04:27:16 xen06 kernel: Stack: Jan 15 04:27:16 xen06 kernel: 0000000000000000 000000000020f3d2 000000013e64d518 000000000003cf04 Jan 15 04:27:16 xen06 kernel: <0> ffff88001aae5e08 ffffffff8100e07c ffff880025db0040 ffff880025fda868 Jan 15 04:27:16 xen06 kernel: <0> ffff88001aae5e18 ffffffff8100e0af ffff88001aae5e58 ffffffff810a402f Jan 15 04:27:16 xen06 kernel: Call Trace: Jan 15 04:27:16 xen06 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Jan 15 04:27:16 xen06 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Jan 15 04:27:16 xen06 kernel: [<ffffffff810a402f>] __pte_alloc+0x70/0xce Jan 15 04:27:16 xen06 kernel: [<ffffffff810a41cd>] handle_mm_fault+0x140/0x8b9 Jan 15 04:27:16 xen06 kernel: [<ffffffff81319dd5>] ? page_fault+0x25/0x30 Jan 15 04:27:16 xen06 kernel: [<ffffffff8131be4d>] do_page_fault+0x252/0x2e2 Jan 15 04:27:16 xen06 kernel: [<ffffffff8116dd7d>] ? __put_user_4+0x1d/0x30 Jan 15 04:27:16 xen06 kernel: [<ffffffff81319dd5>] page_fault+0x25/0x30 Jan 15 04:27:16 xen06 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Jan 15 04:27:16 xen06 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Jan 15 04:27:16 xen06 kernel: RSP <ffff88001aae5dc8> Jan 15 04:27:16 xen06 kernel: ---[ end trace 865f1d440d090f4f ]--- Sorry, this server doesn''t have serial console though :( Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-24 01:42 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Sorry, does anyone able to solve this bug? My prevention doing sleep 5 and sync method only last me longest 20+- days of uptime then the same bug will appear :( Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-24 14:36 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Mon, Jan 24, 2011 at 09:42:02AM +0800, Teck Choon Giam wrote:> Sorry, does anyone able to solve this bug? My prevention doing sleep 5 and > sync method only last me longest 20+- days of uptime then the same bug will > appear :(You have to give more details. Which kernel and what does the back-trace look like?> > Thanks. > > Kindest regards, > Giam Teck Choon> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-24 15:56 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Mon, Jan 24, 2011 at 10:36 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Mon, Jan 24, 2011 at 09:42:02AM +0800, Teck Choon Giam wrote: > > Sorry, does anyone able to solve this bug? My prevention doing sleep 5 > and > > sync method only last me longest 20+- days of uptime then the same bug > will > > appear :( > > You have to give more details. Which kernel and what does the back-trace > look like? >Thanks for your prompt reply. Kernel is from http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=summary latest stable/2.6.32.x Xen version is from http://xenbits.xensource.com/staging/xen-4.0-testing.hglatest changeset 21439 in fact any changeset 214xx I believe. I have posted console output for one of my xen servers and maybe my serial console configuration not right... ... http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00177.html How do I back-trace? You mean I should execute my test crash script with strace and post the output during crash? Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-25 14:48 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Mon, Jan 24, 2011 at 11:56:21PM +0800, Teck Choon Giam wrote:> On Mon, Jan 24, 2011 at 10:36 PM, Konrad Rzeszutek Wilk < > konrad.wilk@oracle.com> wrote: > > > On Mon, Jan 24, 2011 at 09:42:02AM +0800, Teck Choon Giam wrote: > > > Sorry, does anyone able to solve this bug? My prevention doing sleep 5 > > and > > > sync method only last me longest 20+- days of uptime then the same bug > > will > > > appear :( > > > > You have to give more details. Which kernel and what does the back-trace > > look like? > > > > Thanks for your prompt reply. > > Kernel is from > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=summary latest > stable/2.6.32.x > > Xen version is from > http://xenbits.xensource.com/staging/xen-4.0-testing.hglatest > changeset 21439 in fact any changeset 214xx I believe. > > I have posted console output for one of my xen servers and maybe my serial > console configuration not right... ... > > http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00177.html > > How do I back-trace? You mean I should execute my test crash script with > strace and post the output during crash?No that is OK. I now remember this one - I am poking at the code to get an idea of what might be happening. No data yet. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-26 14:31 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Jan 25, 2011 at 09:48:09AM -0500, Konrad Rzeszutek Wilk wrote:> On Mon, Jan 24, 2011 at 11:56:21PM +0800, Teck Choon Giam wrote: > > On Mon, Jan 24, 2011 at 10:36 PM, Konrad Rzeszutek Wilk < > > konrad.wilk@oracle.com> wrote: > > > > > On Mon, Jan 24, 2011 at 09:42:02AM +0800, Teck Choon Giam wrote: > > > > Sorry, does anyone able to solve this bug? My prevention doing sleep 5 > > > and > > > > sync method only last me longest 20+- days of uptime then the same bug > > > will > > > > appear :( > > > > > > You have to give more details. Which kernel and what does the back-trace > > > look like? > > > > > > > Thanks for your prompt reply. > > > > Kernel is from > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=summary latest > > stable/2.6.32.xWhat does git log show? What is the latest commit you have there?> > > > Xen version is from > > http://xenbits.xensource.com/staging/xen-4.0-testing.hglatest > > changeset 21439 in fact any changeset 214xx I believe. > > > > I have posted console output for one of my xen servers and maybe my serial > > console configuration not right... ... > > > > http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00177.html > > > > How do I back-trace? You mean I should execute my test crash script with > > strace and post the output during crash? > > > No that is OK. I now remember this one - I am poking at the code to get an idea > of what might be happening. No data yet. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-27 17:17 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Jan 26, 2011 at 10:31 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Tue, Jan 25, 2011 at 09:48:09AM -0500, Konrad Rzeszutek Wilk wrote: > > On Mon, Jan 24, 2011 at 11:56:21PM +0800, Teck Choon Giam wrote: > > > On Mon, Jan 24, 2011 at 10:36 PM, Konrad Rzeszutek Wilk < > > > konrad.wilk@oracle.com> wrote: > > > > > > > On Mon, Jan 24, 2011 at 09:42:02AM +0800, Teck Choon Giam wrote: > > > > > Sorry, does anyone able to solve this bug? My prevention doing > sleep 5 > > > > and > > > > > sync method only last me longest 20+- days of uptime then the same > bug > > > > will > > > > > appear :( > > > > > > > > You have to give more details. Which kernel and what does the > back-trace > > > > look like? > > > > > > > > > > Thanks for your prompt reply. > > > > > > Kernel is from > > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=summarylatest > > > stable/2.6.32.x > > What does git log show? What is the latest commit you have there? >commit 75cc13f5aa29b4f3227d269ca165dfa8937c94fe Merge: 2607c07 a386bf7 Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Thu Dec 9 17:16:16 2010 -0800 Merge commit ''v2.6.32.27'' into xen/next-2.6.32 * commit ''v2.6.32.27'': (128 commits) Linux 2.6.32.27 x86: uv: xpc NULL deref when mesq becomes empty X86: uv: xpc_make_first_contact hang due to not accepting ACTIVE state x86: uv: XPC receive message reuse triggers invalid BUG_ON() UV - XPC: pass nasid instead of nid to gru_create_message_queue net sched: fix some kernel memory leaks act_nat: use stack variable nmi: fix clock comparator revalidation net: Limit socket I/O iovec total length to INT_MAX. net: Truncate recvfrom and sendto length to INT_MAX. rds: Integer overflow in RDS cmsg handling econet: fix CVE-2010-3850 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849 x86-32: Fix dummy trampoline-related inline stubs x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline x86-32: Separate 1:1 pagetables from swapper_pg_dir crypto: padlock - Fix AES-CBC handling on odd-block-sized input x25: Prevent crashing when parsing bad X.25 facilities V4L/DVB: ivtvfb: prevent reading uninitialized stack memory can-bcm: fix minor heap overflow ... Conflicts: drivers/xen/events.c Anything that I can test or you need me to test... feel free to ask so that I can speed up in tracking/hunting this bug down :) Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-27 20:32 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> > What does git log show? What is the latest commit you have there? > > > > commit 75cc13f5aa29b4f3227d269ca165dfa8937c94fe > Merge: 2607c07 a386bf7 > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Date: Thu Dec 9 17:16:16 2010 -0800 > > Merge commit ''v2.6.32.27'' into xen/next-2.6.32 > > * commit ''v2.6.32.27'': (128 commits) > Linux 2.6.32.27.. snip..> > Anything that I can test or you need me to test... feel free to ask so that > I can speed up in tracking/hunting this bug down :)You got the latest one so you should have the fixes. But something is not working.. just to make sure I am not missing anything can you send me your .config file? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Jan-27 22:20 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Jan 28, 2011 at 4:32 AM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> > > What does git log show? What is the latest commit you have there? > > > > > > > commit 75cc13f5aa29b4f3227d269ca165dfa8937c94fe > > Merge: 2607c07 a386bf7 > > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Date: Thu Dec 9 17:16:16 2010 -0800 > > > > Merge commit ''v2.6.32.27'' into xen/next-2.6.32 > > > > * commit ''v2.6.32.27'': (128 commits) > > Linux 2.6.32.27 > > .. snip.. > > > > Anything that I can test or you need me to test... feel free to ask so > that > > I can speed up in tracking/hunting this bug down :) > > You got the latest one so you should have the fixes. But something is not > working.. just to make sure I am not missing anything can you send me your > .config > file? >Ok... please see attached. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Feb-26 12:03 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Jan 28, 2011 at 6:20 AM, Teck Choon Giam <giamteckchoon@gmail.com>wrote:> > > On Fri, Jan 28, 2011 at 4:32 AM, Konrad Rzeszutek Wilk < > konrad.wilk@oracle.com> wrote: > >> > > What does git log show? What is the latest commit you have there? >> > > >> > >> > commit 75cc13f5aa29b4f3227d269ca165dfa8937c94fe >> > Merge: 2607c07 a386bf7 >> > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> > Date: Thu Dec 9 17:16:16 2010 -0800 >> > >> > Merge commit ''v2.6.32.27'' into xen/next-2.6.32 >> > >> > * commit ''v2.6.32.27'': (128 commits) >> > Linux 2.6.32.27 >> >> .. snip.. >> > >> > Anything that I can test or you need me to test... feel free to ask so >> that >> > I can speed up in tracking/hunting this bug down :) >> >> You got the latest one so you should have the fixes. But something is not >> working.. just to make sure I am not missing anything can you send me your >> .config >> file? >> > > Ok... please see attached. > > > Thanks. > > Kindest regards, > Giam Teck Choon > >Hi Konrad, This is just an update from my testing. I tried with latest xen/next-2.6.32.x and same bug still there: Feb 23 22:56:56 xen06 kernel: ------------[ cut here ]------------ Feb 23 22:56:56 xen06 kernel: kernel BUG at arch/x86/xen/mmu.c:1872! Feb 23 22:56:56 xen06 kernel: invalid opcode: 0000 [#1] SMP Feb 23 22:56:56 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Feb 23 22:56:56 xen06 kernel: CPU 0 Feb 23 22:56:56 xen06 kernel: Modules linked in: dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 libphy button tpm_tis tpm tpm_bios iTCO_wdt i2c_i801 i2c_core pcspkr shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Feb 23 22:56:56 xen06 kernel: Pid: 15222, comm: mpath_wait Not tainted 2.6.32.28-2.xen.pvops.choon.centos5 #1 PowerEdge 860 Feb 23 22:56:56 xen06 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Feb 23 22:56:56 xen06 kernel: RSP: e02b:ffff8800265b9dc8 EFLAGS: 00010282 Feb 23 22:56:56 xen06 kernel: RAX: 00000000ffffffea RBX: 00000000000306d7 RCX: 0000000000000183 Feb 23 22:56:56 xen06 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Feb 23 22:56:56 xen06 kernel: RBP: ffff8800265b9de8 R08: 00000000000006b8 R09: ffff880000000000 Feb 23 22:56:56 xen06 kernel: R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 Feb 23 22:56:56 xen06 kernel: R13: 00000000000306d7 R14: ffff88003b940200 R15: 00007f7655df6258 Feb 23 22:56:56 xen06 kernel: FS: 00007f76560016e0(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 Feb 23 22:56:56 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 23 22:56:56 xen06 kernel: CR2: 00007f7655df6258 CR3: 000000002559b000 CR4: 0000000000002660 Feb 23 22:56:56 xen06 lvm[3984]: Monitoring snapshot XenGroup-testcrash1--snapshot Feb 23 22:56:56 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 23 22:56:56 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 23 22:56:56 xen06 kernel: Process mpath_wait (pid: 15222, threadinfo ffff8800265b8000, task ffff88003b940200) Feb 23 22:56:56 xen06 kernel: Stack: Feb 23 22:56:56 xen06 kernel: 0000000000000000 0000000000202bff 000000013f009e18 00000000000306d7 Feb 23 22:56:56 xen06 kernel: <0> ffff8800265b9e08 ffffffff8100e07c ffff8800267bdac0 ffff88002679f570 Feb 23 22:56:56 xen06 kernel: <0> ffff8800265b9e18 ffffffff8100e0af ffff8800265b9e58 ffffffff810a408b Feb 23 22:56:56 xen06 kernel: Call Trace: Feb 23 22:56:56 xen06 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Feb 23 22:56:56 xen06 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Feb 23 22:56:56 xen06 kernel: [<ffffffff810a408b>] __pte_alloc+0x70/0xce Feb 23 22:56:56 xen06 kernel: [<ffffffff810a4229>] handle_mm_fault+0x140/0x8b9 Feb 23 22:56:56 xen06 kernel: [<ffffffff8131be2d>] do_page_fault+0x252/0x2e2 Feb 23 22:56:56 xen06 kernel: [<ffffffff81319db5>] page_fault+0x25/0x30 Feb 23 22:56:56 xen06 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Feb 23 22:56:56 xen06 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Feb 23 22:56:56 xen06 kernel: RSP <ffff8800265b9dc8> Feb 23 22:56:56 xen06 kernel: ---[ end trace fd2f141edfc37649 ]--- Feb 23 22:56:56 xen06 kernel: kjournald starting. Commit interval 5 seconds Feb 23 22:56:56 xen06 kernel: EXT3 FS on dm-11, internal journal Feb 23 22:56:56 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 23 23:00:01 xen06 syslogd 1.4.1: restart. Feb 23 23:00:01 xen06 kernel: klogd 1.4.1, log source = /proc/kmsg started. Feb 23 23:00:01 xen06 kernel: Linux version 2.6.32.28-2.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Feb 23 22:29:39 SGT 2011 Feb 23 23:00:01 xen06 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 Feb 23 23:00:01 xen06 kernel: KERNEL supported cpus: Feb 23 23:00:01 xen06 kernel: Intel GenuineIntel Feb 23 23:00:01 xen06 kernel: AMD AuthenticAMD Feb 23 23:00:01 xen06 kernel: Centaur CentaurHauls Feb 23 23:00:01 xen06 kernel: released 0 pages of unused memory Feb 23 23:00:01 xen06 kernel: BIOS-provided physical RAM map: Feb 23 23:00:01 xen06 kernel: Xen: 0000000000000000 - 00000000000a0000 (usable) Feb 23 23:00:01 xen06 kernel: Xen: 00000000000a0000 - 0000000000100000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 0000000000100000 - 0000000040000000 (usable) Feb 23 23:00:01 xen06 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 (ACPI data) Feb 23 23:00:01 xen06 kernel: Xen: 00000000dffcfc00 - 00000000dffff000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000000f0000000 - 00000000f4000000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000000fec00000 - 00000000fed00400 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000000fed13000 - 00000000feda0000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000000fee00000 - 00000000fee10000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000000ffb00000 - 0000000100000000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 00000001ffffe000 - 0000000200000000 (reserved) Feb 23 23:00:01 xen06 kernel: Xen: 0000000200000000 - 00000003bffbe000 (usable) Even I patch with latest 2.6.32.29: Feb 24 01:03:17 xen06 kernel: ------------[ cut here ]------------ Feb 24 01:03:17 xen06 kernel: kernel BUG at arch/x86/xen/mmu.c:1872! Feb 24 01:03:17 xen06 kernel: invalid opcode: 0000 [#2] SMP Feb 24 01:03:17 xen06 kernel: last sysfs file: /sys/block/dm-13/dev Feb 24 01:03:17 xen06 kernel: CPU 2 Feb 24 01:03:17 xen06 kernel: Modules linked in: dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom tg3 serio_raw libphy button tpm_tis tpm tpm_bios pcspkr shpchp i2c_i801 i2c_core iTCO_wdt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Feb 24 01:03:17 xen06 kernel: Pid: 509, comm: dmsetup Tainted: G D 2.6.32.29-0.xen.pvops.choon.centos5 #1 PowerEdge 860 Feb 24 01:03:17 xen06 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Feb 24 01:03:17 xen06 kernel: RSP: e02b:ffff88003ced5dc8 EFLAGS: 00010282 Feb 24 01:03:17 xen06 kernel: RAX: 00000000ffffffea RBX: 0000000000030395 RCX: 0000000000000181 Feb 24 01:03:17 xen06 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef Feb 24 01:03:17 xen06 kernel: RBP: ffff88003ced5de8 R08: 0000000000000ca8 R09: ffff880000000000 Feb 24 01:03:17 xen06 kernel: R10: 00000000deadbeef R11: 0000000000000246 R12: 0000000000000003 Feb 24 01:03:17 xen06 kernel: R13: 0000000000030395 R14: ffff88003025c4c0 R15: 00000033e1e00258 Feb 24 01:03:17 xen06 kernel: FS: 00007ff20eb366e0(0000) GS:ffff880028089000(0000) knlGS:0000000000000000 Feb 24 01:03:17 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 24 01:03:17 xen06 kernel: CR2: 00000033e1e00258 CR3: 000000003b57b000 CR4: 0000000000002660 Feb 24 01:03:17 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 24 01:03:17 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 24 01:03:17 xen06 kernel: Process dmsetup (pid: 509, threadinfo ffff88003ced4000, task ffff88003025c4c0) Feb 24 01:03:17 xen06 kernel: Stack: Feb 24 01:03:17 xen06 kernel: 0000000000000000 0000000000202741 000000013e6e5a18 0000000000030395 Feb 24 01:03:17 xen06 kernel: <0> ffff88003ced5e08 ffffffff8100e07c ffff88003000aac0 ffff880036bb1878 Feb 24 01:03:17 xen06 kernel: <0> ffff88003ced5e18 ffffffff8100e0af ffff88003ced5e58 ffffffff810a4309 Feb 24 01:03:17 xen06 kernel: Call Trace: Feb 24 01:03:17 xen06 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 Feb 24 01:03:17 xen06 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 Feb 24 01:03:17 xen06 kernel: [<ffffffff810a4309>] __pte_alloc+0x70/0xce Feb 24 01:03:17 xen06 kernel: [<ffffffff810a44a7>] handle_mm_fault+0x140/0x8b9 Feb 24 01:03:17 xen06 kernel: [<ffffffff8131c1fd>] do_page_fault+0x252/0x2e2 Feb 24 01:03:17 xen06 kernel: [<ffffffff8131a185>] page_fault+0x25/0x30 Feb 24 01:03:18 xen06 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 Feb 24 01:03:18 xen06 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 Feb 24 01:03:18 xen06 kernel: RSP <ffff88003ced5dc8> Feb 24 01:03:18 xen06 kernel: ---[ end trace 5ea31e622470b519 ]--- Feb 24 01:06:21 xen06 syslogd 1.4.1: restart. Feb 24 01:06:21 xen06 kernel: klogd 1.4.1, log source = /proc/kmsg started. Feb 24 01:06:21 xen06 kernel: Linux version 2.6.32.29-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Thu Feb 24 00:21:24 SGT 2011 Feb 24 01:06:21 xen06 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 Feb 24 01:06:21 xen06 kernel: KERNEL supported cpus: Feb 24 01:06:21 xen06 kernel: Intel GenuineIntel Feb 24 01:06:21 xen06 kernel: AMD AuthenticAMD Feb 24 01:06:21 xen06 kernel: Centaur CentaurHauls Feb 24 01:06:21 xen06 kernel: released 0 pages of unused memory Feb 24 01:06:21 xen06 kernel: BIOS-provided physical RAM map: Feb 24 01:06:21 xen06 kernel: Xen: 0000000000000000 - 00000000000a0000 (usable) Feb 24 01:06:21 xen06 kernel: Xen: 00000000000a0000 - 0000000000100000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 0000000000100000 - 0000000040000000 (usable) Feb 24 01:06:21 xen06 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 (ACPI data) Feb 24 01:06:21 xen06 kernel: Xen: 00000000dffcfc00 - 00000000dffff000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000000f0000000 - 00000000f4000000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000000fec00000 - 00000000fed00400 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000000fed13000 - 00000000feda0000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000000fee00000 - 00000000fee10000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000000ffb00000 - 0000000100000000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 00000001ffffe000 - 0000000200000000 (reserved) Feb 24 01:06:21 xen06 kernel: Xen: 0000000200000000 - 00000003bffbe000 (usable) Feb 24 01:06:21 xen06 kernel: DMI 2.4 present. Feb 24 01:06:21 xen06 kernel: last_pfn = 0x3bffbe max_arch_pfn = 0x400000000 Feb 24 01:06:21 xen06 kernel: x86 PAT enabled: cpu 0, old 0x50100070406, new 0x7010600070106 Feb 24 01:06:21 xen06 kernel: last_pfn = 0x40000 max_arch_pfn = 0x400000000 Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Feb-28 16:20 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> This is just an update from my testing. > > I tried with latest xen/next-2.6.32.x and same bug still there:Grrrrrrrr.. Let me setup a machine to reproduce this. Is the distro you are using still CentOS 5?> > Feb 23 22:56:56 xen06 kernel: ------------[ cut here ]------------ > Feb 23 22:56:56 xen06 kernel: kernel BUG at arch/x86/xen/mmu.c:1872! > Feb 23 22:56:56 xen06 kernel: invalid opcode: 0000 [#1] SMP > Feb 23 22:56:56 xen06 kernel: last sysfs file: > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > Feb 23 22:56:56 xen06 kernel: CPU 0 > Feb 23 22:56:56 xen06 kernel: Modules linked in: dlm configfs xt_physdev > iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic > uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi > dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon > battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg > ide_cd_mod cdrom serio_raw tg3 libphy button tpm_tis tpm tpm_bios iTCO_wdt > i2c_i801 i2c_core pcspkr shpchp dm_snapshot dm_zero dm_mirror dm_region_hash > dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd > ohci_hcd ehci_hcd [last unloaded: microcode] > Feb 23 22:56:56 xen06 kernel: Pid: 15222, comm: mpath_wait Not tainted > 2.6.32.28-2.xen.pvops.choon.centos5 #1 PowerEdge 860 > Feb 23 22:56:56 xen06 kernel: RIP: e030:[<ffffffff8100cb5b>] > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > Feb 23 22:56:56 xen06 kernel: RSP: e02b:ffff8800265b9dc8 EFLAGS: 00010282 > Feb 23 22:56:56 xen06 kernel: RAX: 00000000ffffffea RBX: 00000000000306d7 > RCX: 0000000000000183 > Feb 23 22:56:56 xen06 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef > RDI: 00000000deadbeef > Feb 23 22:56:56 xen06 kernel: RBP: ffff8800265b9de8 R08: 00000000000006b8 > R09: ffff880000000000 > Feb 23 22:56:56 xen06 kernel: R10: 00000000deadbeef R11: 0000000000000246 > R12: 0000000000000003 > Feb 23 22:56:56 xen06 kernel: R13: 00000000000306d7 R14: ffff88003b940200 > R15: 00007f7655df6258 > Feb 23 22:56:56 xen06 kernel: FS: 00007f76560016e0(0000) > GS:ffff88002804f000(0000) knlGS:0000000000000000 > Feb 23 22:56:56 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: > 000000008005003b > Feb 23 22:56:56 xen06 kernel: CR2: 00007f7655df6258 CR3: 000000002559b000 > CR4: 0000000000002660 > Feb 23 22:56:56 xen06 lvm[3984]: Monitoring snapshot > XenGroup-testcrash1--snapshot > Feb 23 22:56:56 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: 0000000000000000 > Feb 23 22:56:56 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > DR7: 0000000000000400 > Feb 23 22:56:56 xen06 kernel: Process mpath_wait (pid: 15222, threadinfo > ffff8800265b8000, task ffff88003b940200) > Feb 23 22:56:56 xen06 kernel: Stack: > Feb 23 22:56:56 xen06 kernel: 0000000000000000 0000000000202bff > 000000013f009e18 00000000000306d7 > Feb 23 22:56:56 xen06 kernel: <0> ffff8800265b9e08 ffffffff8100e07c > ffff8800267bdac0 ffff88002679f570 > Feb 23 22:56:56 xen06 kernel: <0> ffff8800265b9e18 ffffffff8100e0af > ffff8800265b9e58 ffffffff810a408b > Feb 23 22:56:56 xen06 kernel: Call Trace: > Feb 23 22:56:56 xen06 kernel: [<ffffffff8100e07c>] > xen_alloc_ptpage+0x64/0x69 > Feb 23 22:56:56 xen06 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 > Feb 23 22:56:56 xen06 kernel: [<ffffffff810a408b>] __pte_alloc+0x70/0xce > Feb 23 22:56:56 xen06 kernel: [<ffffffff810a4229>] > handle_mm_fault+0x140/0x8b9 > Feb 23 22:56:56 xen06 kernel: [<ffffffff8131be2d>] > do_page_fault+0x252/0x2e2 > Feb 23 22:56:56 xen06 kernel: [<ffffffff81319db5>] page_fault+0x25/0x30 > Feb 23 22:56:56 xen06 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b > 48 > Feb 23 22:56:56 xen06 kernel: RIP [<ffffffff8100cb5b>] > pin_pagetable_pfn+0x53/0x59 > Feb 23 22:56:56 xen06 kernel: RSP <ffff8800265b9dc8> > Feb 23 22:56:56 xen06 kernel: ---[ end trace fd2f141edfc37649 ]--- > Feb 23 22:56:56 xen06 kernel: kjournald starting. Commit interval 5 seconds > Feb 23 22:56:56 xen06 kernel: EXT3 FS on dm-11, internal journal > Feb 23 22:56:56 xen06 kernel: EXT3-fs: mounted filesystem with ordered data > mode. > Feb 23 23:00:01 xen06 syslogd 1.4.1: restart. > Feb 23 23:00:01 xen06 kernel: klogd 1.4.1, log source = /proc/kmsg started. > Feb 23 23:00:01 xen06 kernel: Linux version > 2.6.32.28-2.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc > version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Feb 23 22:29:39 SGT > 2011 > Feb 23 23:00:01 xen06 kernel: Command line: ro root=/dev/md1 panic=5 > panic_timeout=5 > Feb 23 23:00:01 xen06 kernel: KERNEL supported cpus: > Feb 23 23:00:01 xen06 kernel: Intel GenuineIntel > Feb 23 23:00:01 xen06 kernel: AMD AuthenticAMD > Feb 23 23:00:01 xen06 kernel: Centaur CentaurHauls > Feb 23 23:00:01 xen06 kernel: released 0 pages of unused memory > Feb 23 23:00:01 xen06 kernel: BIOS-provided physical RAM map: > Feb 23 23:00:01 xen06 kernel: Xen: 0000000000000000 - 00000000000a0000 > (usable) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000000a0000 - 0000000000100000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 0000000000100000 - 0000000040000000 > (usable) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 > (ACPI data) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000dffcfc00 - 00000000dffff000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000f0000000 - 00000000f4000000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000fec00000 - 00000000fed00400 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000fed13000 - 00000000feda0000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000fee00000 - 00000000fee10000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000000ffb00000 - 0000000100000000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 00000001ffffe000 - 0000000200000000 > (reserved) > Feb 23 23:00:01 xen06 kernel: Xen: 0000000200000000 - 00000003bffbe000 > (usable) > > Even I patch with latest 2.6.32.29: > > Feb 24 01:03:17 xen06 kernel: ------------[ cut here ]------------ > Feb 24 01:03:17 xen06 kernel: kernel BUG at arch/x86/xen/mmu.c:1872! > Feb 24 01:03:17 xen06 kernel: invalid opcode: 0000 [#2] SMP > Feb 24 01:03:17 xen06 kernel: last sysfs file: /sys/block/dm-13/dev > Feb 24 01:03:17 xen06 kernel: CPU 2 > Feb 24 01:03:17 xen06 kernel: Modules linked in: dlm configfs xt_physdev > iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic > uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi > dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon > battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg > ide_cd_mod cdrom tg3 serio_raw libphy button tpm_tis tpm tpm_bios pcspkr > shpchp i2c_i801 i2c_core iTCO_wdt dm_snapshot dm_zero dm_mirror > dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > Feb 24 01:03:17 xen06 kernel: Pid: 509, comm: dmsetup Tainted: G D > 2.6.32.29-0.xen.pvops.choon.centos5 #1 PowerEdge 860 > Feb 24 01:03:17 xen06 kernel: RIP: e030:[<ffffffff8100cb5b>] > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > Feb 24 01:03:17 xen06 kernel: RSP: e02b:ffff88003ced5dc8 EFLAGS: 00010282 > Feb 24 01:03:17 xen06 kernel: RAX: 00000000ffffffea RBX: 0000000000030395 > RCX: 0000000000000181 > Feb 24 01:03:17 xen06 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef > RDI: 00000000deadbeef > Feb 24 01:03:17 xen06 kernel: RBP: ffff88003ced5de8 R08: 0000000000000ca8 > R09: ffff880000000000 > Feb 24 01:03:17 xen06 kernel: R10: 00000000deadbeef R11: 0000000000000246 > R12: 0000000000000003 > Feb 24 01:03:17 xen06 kernel: R13: 0000000000030395 R14: ffff88003025c4c0 > R15: 00000033e1e00258 > Feb 24 01:03:17 xen06 kernel: FS: 00007ff20eb366e0(0000) > GS:ffff880028089000(0000) knlGS:0000000000000000 > Feb 24 01:03:17 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: > 000000008005003b > Feb 24 01:03:17 xen06 kernel: CR2: 00000033e1e00258 CR3: 000000003b57b000 > CR4: 0000000000002660 > Feb 24 01:03:17 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: 0000000000000000 > Feb 24 01:03:17 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > DR7: 0000000000000400 > Feb 24 01:03:17 xen06 kernel: Process dmsetup (pid: 509, threadinfo > ffff88003ced4000, task ffff88003025c4c0) > Feb 24 01:03:17 xen06 kernel: Stack: > Feb 24 01:03:17 xen06 kernel: 0000000000000000 0000000000202741 > 000000013e6e5a18 0000000000030395 > Feb 24 01:03:17 xen06 kernel: <0> ffff88003ced5e08 ffffffff8100e07c > ffff88003000aac0 ffff880036bb1878 > Feb 24 01:03:17 xen06 kernel: <0> ffff88003ced5e18 ffffffff8100e0af > ffff88003ced5e58 ffffffff810a4309 > Feb 24 01:03:17 xen06 kernel: Call Trace: > Feb 24 01:03:17 xen06 kernel: [<ffffffff8100e07c>] > xen_alloc_ptpage+0x64/0x69 > Feb 24 01:03:17 xen06 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 > Feb 24 01:03:17 xen06 kernel: [<ffffffff810a4309>] __pte_alloc+0x70/0xce > Feb 24 01:03:17 xen06 kernel: [<ffffffff810a44a7>] > handle_mm_fault+0x140/0x8b9 > Feb 24 01:03:17 xen06 kernel: [<ffffffff8131c1fd>] > do_page_fault+0x252/0x2e2 > Feb 24 01:03:17 xen06 kernel: [<ffffffff8131a185>] page_fault+0x25/0x30 > Feb 24 01:03:18 xen06 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 > 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff > ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b > 48 > Feb 24 01:03:18 xen06 kernel: RIP [<ffffffff8100cb5b>] > pin_pagetable_pfn+0x53/0x59 > Feb 24 01:03:18 xen06 kernel: RSP <ffff88003ced5dc8> > Feb 24 01:03:18 xen06 kernel: ---[ end trace 5ea31e622470b519 ]--- > Feb 24 01:06:21 xen06 syslogd 1.4.1: restart. > Feb 24 01:06:21 xen06 kernel: klogd 1.4.1, log source = /proc/kmsg started. > Feb 24 01:06:21 xen06 kernel: Linux version > 2.6.32.29-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc > version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Thu Feb 24 00:21:24 SGT > 2011 > Feb 24 01:06:21 xen06 kernel: Command line: ro root=/dev/md1 panic=5 > panic_timeout=5 > Feb 24 01:06:21 xen06 kernel: KERNEL supported cpus: > Feb 24 01:06:21 xen06 kernel: Intel GenuineIntel > Feb 24 01:06:21 xen06 kernel: AMD AuthenticAMD > Feb 24 01:06:21 xen06 kernel: Centaur CentaurHauls > Feb 24 01:06:21 xen06 kernel: released 0 pages of unused memory > Feb 24 01:06:21 xen06 kernel: BIOS-provided physical RAM map: > Feb 24 01:06:21 xen06 kernel: Xen: 0000000000000000 - 00000000000a0000 > (usable) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000000a0000 - 0000000000100000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 0000000000100000 - 0000000040000000 > (usable) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000dffc0000 - 00000000dffcfc00 > (ACPI data) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000dffcfc00 - 00000000dffff000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000f0000000 - 00000000f4000000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000fec00000 - 00000000fed00400 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000fed13000 - 00000000feda0000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000fee00000 - 00000000fee10000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000000ffb00000 - 0000000100000000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 00000001ffffe000 - 0000000200000000 > (reserved) > Feb 24 01:06:21 xen06 kernel: Xen: 0000000200000000 - 00000003bffbe000 > (usable) > Feb 24 01:06:21 xen06 kernel: DMI 2.4 present. > Feb 24 01:06:21 xen06 kernel: last_pfn = 0x3bffbe max_arch_pfn = 0x400000000 > Feb 24 01:06:21 xen06 kernel: x86 PAT enabled: cpu 0, old 0x50100070406, new > 0x7010600070106 > Feb 24 01:06:21 xen06 kernel: last_pfn = 0x40000 max_arch_pfn = 0x400000000 > > Thanks. > > Kindest regards, > Giam Teck Choon> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-01 09:59 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Mar 1, 2011 at 12:20 AM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> > This is just an update from my testing. > > > > I tried with latest xen/next-2.6.32.x and same bug still there: > > Grrrrrrrr.. > > Let me setup a machine to reproduce this. Is the distro you are using > still CentOS 5? >Yes. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-03 22:16 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Mar 01, 2011 at 05:59:54PM +0800, Teck Choon Giam wrote:> On Tue, Mar 1, 2011 at 12:20 AM, Konrad Rzeszutek Wilk < > konrad.wilk@oracle.com> wrote: > > > > This is just an update from my testing. > > > > > > I tried with latest xen/next-2.6.32.x and same bug still there: > > > > Grrrrrrrr.. > > > > Let me setup a machine to reproduce this. Is the distro you are using > > still CentOS 5? > > > > Yes.OK, I got a machine with CentOS 5.5 installed. Had some trouble getting the kernel to boot - a different issue that the one you are seeing however. What arguments are you using for your dom0 and Xen hypervisor? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-04 05:30 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Mar 4, 2011 at 6:16 AM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Tue, Mar 01, 2011 at 05:59:54PM +0800, Teck Choon Giam wrote: > > On Tue, Mar 1, 2011 at 12:20 AM, Konrad Rzeszutek Wilk < > > konrad.wilk@oracle.com> wrote: > > > > > > This is just an update from my testing. > > > > > > > > I tried with latest xen/next-2.6.32.x and same bug still there: > > > > > > Grrrrrrrr.. > > > > > > Let me setup a machine to reproduce this. Is the distro you are using > > > still CentOS 5? > > > > > > > Yes. > > OK, I got a machine with CentOS 5.5 installed. Had some trouble getting > the kernel to boot - a different issue that the one you are seeing however. > > What arguments are you using for your dom0 and Xen hypervisor? >Example in one of my test server: title CentOS (2.6.32.29-0.xen.pvops.choon.centos5) root (hd0,0) kernel /xen.gz dom0_mem=1024M loglvl=all guest_loglvl=all cpuidle=0 cpufreq=none module /vmlinuz-2.6.32.29-0.xen.pvops.choon.centos5 ro root=/dev/md1 panic=5 panic_timeout=5 module /initrd-2.6.32.29-0.xen.pvops.choon.centos5.img Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Fajar A. Nugraha
2011-Mar-04 06:15 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Mar 4, 2011 at 12:30 PM, Teck Choon Giam <giamteckchoon@gmail.com>wrote:> > > On Fri, Mar 4, 2011 at 6:16 AM, Konrad Rzeszutek Wilk < > konrad.wilk@oracle.com> wrote: > >> On Tue, Mar 01, 2011 at 05:59:54PM +0800, Teck Choon Giam wrote: >> > On Tue, Mar 1, 2011 at 12:20 AM, Konrad Rzeszutek Wilk < >> > konrad.wilk@oracle.com> wrote: >> > >> > > > This is just an update from my testing. >> > > > >> > > > I tried with latest xen/next-2.6.32.x and same bug still there: >> > > >> > > Grrrrrrrr.. >> > > >> > > Let me setup a machine to reproduce this. Is the distro you are using >> > > still CentOS 5? >> > > >> > >> > Yes. >> >> OK, I got a machine with CentOS 5.5 installed. Had some trouble getting >> the kernel to boot - a different issue that the one you are seeing >> however. >> >> What arguments are you using for your dom0 and Xen hypervisor? >> > > > Example in one of my test server: > > title CentOS (2.6.32.29-0.xen.pvops.choon.centos5) > root (hd0,0) > kernel /xen.gz dom0_mem=1024M loglvl=all guest_loglvl=all cpuidle=0 > cpufreq=none > module /vmlinuz-2.6.32.29-0.xen.pvops.choon.centos5 ro root=/dev/md1 > panic=5 panic_timeout=5 > module /initrd-2.6.32.29-0.xen.pvops.choon.centos5.img >RHEL/Centos5 also needs CONFIG_SYSFS_DEPRECATED=y, CONFIG_SYSFS_DEPRECATED_V2=y for 2.6.32 kernel to work correctly with userland tools. -- Fajar _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-04 06:33 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, Mar 4, 2011 at 2:15 PM, Fajar A. Nugraha <list@fajar.net> wrote:> On Fri, Mar 4, 2011 at 12:30 PM, Teck Choon Giam <giamteckchoon@gmail.com>wrote: > >> >> >> On Fri, Mar 4, 2011 at 6:16 AM, Konrad Rzeszutek Wilk < >> konrad.wilk@oracle.com> wrote: >> >>> On Tue, Mar 01, 2011 at 05:59:54PM +0800, Teck Choon Giam wrote: >>> > On Tue, Mar 1, 2011 at 12:20 AM, Konrad Rzeszutek Wilk < >>> > konrad.wilk@oracle.com> wrote: >>> > >>> > > > This is just an update from my testing. >>> > > > >>> > > > I tried with latest xen/next-2.6.32.x and same bug still there: >>> > > >>> > > Grrrrrrrr.. >>> > > >>> > > Let me setup a machine to reproduce this. Is the distro you are using >>> > > still CentOS 5? >>> > > >>> > >>> > Yes. >>> >>> OK, I got a machine with CentOS 5.5 installed. Had some trouble getting >>> the kernel to boot - a different issue that the one you are seeing >>> however. >>> >>> What arguments are you using for your dom0 and Xen hypervisor? >>> >> >> >> Example in one of my test server: >> >> title CentOS (2.6.32.29-0.xen.pvops.choon.centos5) >> root (hd0,0) >> kernel /xen.gz dom0_mem=1024M loglvl=all guest_loglvl=all cpuidle=0 >> cpufreq=none >> module /vmlinuz-2.6.32.29-0.xen.pvops.choon.centos5 ro root=/dev/md1 >> panic=5 panic_timeout=5 >> module /initrd-2.6.32.29-0.xen.pvops.choon.centos5.img >> > > > RHEL/Centos5 also > needs CONFIG_SYSFS_DEPRECATED=y, CONFIG_SYSFS_DEPRECATED_V2=y for 2.6.32 > kernel to work correctly with userland tools. >Thanks Fajar ;) Yes. All are set in my kernel config and I believed Konrad has my kernel config. I didn''t use nomodeset as my servers don''t have those graphic cards... ... Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-08 19:29 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> Yes. All are set in my kernel config and I believed Konrad has my kernel > config. I didn''t use nomodeset as my servers don''t have those graphic > cards... ...I am having a hard-time reproducing this. I made six LV''s: [root@tst011 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LogVol00 VolGroup00 -wi-ao 17.56G LogVol01 VolGroup00 -wi-ao 5.81G data-1.ext3 XenGroup -wi-ao 10.00G data-2.ext3 XenGroup owi-ao 10.00G data-2.ext3-snapshot XenGroup swi-a- 1.00G data-2.ext3 2.41 data-3.ext3 XenGroup -wi-a- 10.00G scratch-1.ext3 XenGroup -wi-ao 10.00G scratch-2.ext3 XenGroup -wi-ao 10.00G scratch-3.ext3 XenGroup -wi-a- 10.00G [root@tst011 ~]# Where each scratch-X/data-X was prepared with ''mkfs.ext3'' And ran two guests, where each guest configuration looks as so: kernel="/mnt/lab/latest/vmlinuz" ramdisk="/mnt/lab/latest/initramfs.cpio.gz" extra="console=hvc0 debug" memory=768 vcpus=4 on_crash="preserve" #vif = [ ''mac=00:0f:4b:00:00:68, bridge=switch'' ] vfb = [ ''vnc=1, vnclisten=0.0.0.0,vncunused=1''] disk = [ ''phy:/dev/XenGroup/scratch-1.ext3,xvda,w'', ''phy:/dev/XenGroup/data-1.ext3,xvdb,w''] (the other is using -2 obviously). And each guest is running ''mount /dev/xvda /mnt-1;(cd /mnt-1;fio iometer-file-access-server)'' to produce I/Os on the xvdb/xvda disks. As those guests are chugging along, I ran your script with: root 6250 4438 0 10:12 pts/4 00:00:00 /bin/sh ./lvm-test.sh loop 100 1G 5 (I tried 100 1G 0) and so far it is running.... how long should I wait until I hit this problem? This is on CentOS 5.5 and also on Fedora Core 13. Dom0 and DomU are all x86_64. Dom0 is: commit 892d2f052e979cf1916647c752b94cf62ec1c6dc Merge: 35e2e28... 376faec... Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Fri Feb 11 13:31:03 2011 -0800 Merge commit ''v2.6.32.28'' into xen/next-2.6.32 (plus one patch I just posted - but that is to fix the serial console, so it is not relevant to this problem). DomU is 2.6.38 kernel, but I can swap over to the same as Dom0.. Attached is the script I am using. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-08 20:10 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Mar 08, 2011 at 02:29:50PM -0500, Konrad Rzeszutek Wilk wrote:> > Yes. All are set in my kernel config and I believed Konrad has my kernel > > config. I didn''t use nomodeset as my servers don''t have those graphic > > cards... ... > > I am having a hard-time reproducing this. I made six LV''s:Aha! respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt : entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2733:d0 Error while pinning mfn 35620 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2733:d0 Error while pinning mfn 3b286 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2500:d0 Error while installing new baseptr 35620 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2825:d0 Error while installing new mfn 3b286 (XEN) mm.c:2794:d0 Mfn 35620 not pinned (XEN) mm.c:2794:d0 Mfn 3b286 not pinned Takes a bit of time to reproduce it, but I can reproduce it on my CentOS 5.5 OS. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-08 20:20 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Mar 9, 2011 at 4:10 AM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> On Tue, Mar 08, 2011 at 02:29:50PM -0500, Konrad Rzeszutek Wilk wrote: > > > Yes. All are set in my kernel config and I believed Konrad has my > kernel > > > config. I didn''t use nomodeset as my servers don''t have those graphic > > > cards... ... > > > > I am having a hard-time reproducing this. I made six LV''s: > > Aha! > > respawning too fast: disabled for 5 minutes > INIT: Id "co" respawning too fast: disabled for 5 minutes > INIT: Id "co" respawning too fast: disabled for 5 minutes > INIT: Id "co" respawning too fast: disabled for 5 minutes > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) > for mfn 354e0 (pfn c76d7) > (XEN) mm.c:942:d0 Attempt : entry 255 > (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type > 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2733:d0 Error while pinning mfn 35620 > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) > for mfn 354e0 (pfn c76d7) > (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms > (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 > (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type > 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2733:d0 Error while pinning mfn 3b286 > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) > for mfn 354e0 (pfn c76d7) > (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms > (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 > (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type > 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2500:d0 Error while installing new baseptr 35620 > (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) > for mfn 354e0 (pfn c76d7) > (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms > (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 > (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type > 4000000000000000: caf=8000000000000003 taf=4000000000000001 > (XEN) mm.c:2825:d0 Error while installing new mfn 3b286 > (XEN) mm.c:2794:d0 Mfn 35620 not pinned > (XEN) mm.c:2794:d0 Mfn 3b286 not pinned > > Takes a bit of time to reproduce it, but I can reproduce it on my CentOS > 5.5 OS. >Yep... it takes sometime to hit this bug and if you run my test script without any guest running will be faster to reproduce it. May I know what is your normal development platform/distribution for xen? I hope this isn''t just distribution specific issue for PVOPS stable/xen-2.6.32.x Dom0... ... mostly not as from this thread alone I see someone saying they hit this bug as well and I believe they are not running CentOS 5.x... guess is debian based distribution. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, maybe I''ve got the same problem on debian squeeze. I opened a bug report some time ago but diddn''t get any feedback till now. Perhaps there were not enough informations... http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=614400 Guido Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Teck Choon Giam Gesendet: Dienstag, 8. März 2011 21:21 An: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com; Fajar A. Nugraha Betreff: Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860! On Wed, Mar 9, 2011 at 4:10 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: On Tue, Mar 08, 2011 at 02:29:50PM -0500, Konrad Rzeszutek Wilk wrote:> > Yes. All are set in my kernel config and I believed Konrad has my kernel > > config. I didn''t use nomodeset as my servers don''t have those graphic > > cards... ... > > I am having a hard-time reproducing this. I made six LV''s:Aha! respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes INIT: Id "co" respawning too fast: disabled for 5 minutes (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt : entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2733:d0 Error while pinning mfn 35620 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2733:d0 Error while pinning mfn 3b286 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 35620 (pfn c7417) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2500:d0 Error while installing new baseptr 35620 (XEN) mm.c:2364:d0 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 354e0 (pfn c76d7) (XEN) mm.c:942:d0 Attempt to create linear p.t. with write perms (XEN) mm.c:1487:d0 Failure in alloc_l4_table: entry 255 (XEN) mm.c:2117:d0 Error while validating mfn 3b286 (pfn c187d) for type 4000000000000000: caf=8000000000000003 taf=4000000000000001 (XEN) mm.c:2825:d0 Error while installing new mfn 3b286 (XEN) mm.c:2794:d0 Mfn 35620 not pinned (XEN) mm.c:2794:d0 Mfn 3b286 not pinned Takes a bit of time to reproduce it, but I can reproduce it on my CentOS 5.5 OS. Yep... it takes sometime to hit this bug and if you run my test script without any guest running will be faster to reproduce it. May I know what is your normal development platform/distribution for xen? I hope this isn''t just distribution specific issue for PVOPS stable/xen-2.6.32.x Dom0... ... mostly not as from this thread alone I see someone saying they hit this bug as well and I believe they are not running CentOS 5.x... guess is debian based distribution. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
tjaouen
2011-Mar-08 20:50 UTC
[Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
Teck Choon Giam wrote:> > > I hope this isn''t just distribution specific issue for PVOPS > stable/xen-2.6.32.x Dom0... ... mostly not as from this thread alone I see > someone saying they hit this bug as well and I believe they are not > running > CentOS 5.x... guess is debian based distribution. > >Debian/Squeeze ""Stable"" Debian GNU/Linux 6.0 Linux xencave 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12 05:46:49 UTC 2011 x86_64 GNU/Linux March 1, I enter this command: lvcreate -s -L 5G -n databank_snap /dev/vgraid1/databank .... and BUG in /var/log/syslog: Mar 1 21:23:56 xencave kernel: [1840078.405905] ------------[ cut here ]------------ Mar 1 21:23:56 xencave kernel: [1840078.405937] kernel BUG at /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_xen/arch/x86/xen/mmu.c:1649! Mar 1 21:23:56 xencave kernel: [1840078.405981] invalid opcode: 0000 [#1] SMP Mar 1 21:23:56 xencave kernel: [1840078.406010] last sysfs file: /sys/devices/virtual/bdi/253:35/uevent Mar 1 21:23:56 xencave kernel: [1840078.406037] CPU 0 Mar 1 21:23:56 xencave kernel: [1840078.406065] Modules linked in: dm_snapshot nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs bridge stp coretemp it87 hwmon_vid fuse loop i915 drm_kms_helper drm snd_pcsp processor i2c_algo_bit button video acpi_processor i2c_i801 snd_pcm snd_timer rng_core serio_raw i2c_core evdev snd output soundcore snd_page_alloc usbhid hid ext3 jbd mbcache dm_mod raid1 md_mod sg sr_mod cdrom sd_mod crc_t10dif ata_generic uhci_hcd ata_piix libata r8169 mii thermal thermal_sys ehci_hcd scsi_mod usbcore nls_base [last unloaded: scsi_wait_scan] Mar 1 21:23:56 xencave kernel: [1840078.406505] Pid: 14525, comm: udevd Not tainted 2.6.32-5-xen-amd64 #1 945GM-S2 Mar 1 21:23:56 xencave kernel: [1840078.406539] RIP: e030:[] [] pin_pagetable_pfn+0x2d/0x36 Mar 1 21:23:56 xencave kernel: [1840078.406586] RSP: e02b:ffff88000df33e08 EFLAGS: 00010282 Mar 1 21:23:56 xencave kernel: [1840078.406608] RAX: 00000000ffffffea RBX: 000000000000df31 RCX: 0000000000000001 Mar 1 21:23:56 xencave kernel: [1840078.406642] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88000df33e08 Mar 1 21:23:56 xencave kernel: [1840078.406677] RBP: ffff8800028bb480 R08: 0000000000000988 R09: ffffea000030d2b8 Mar 1 21:23:56 xencave kernel: [1840078.406711] R10: 0000000000007ff0 R11: ffff880000000838 R12: ffff88004f026a48 Mar 1 21:23:56 xencave kernel: [1840078.406745] R13: ffff880002be3f18 R14: ffff8800513f2a60 R15: ffff88004f026a48 Mar 1 21:23:56 xencave kernel: [1840078.406784] FS: 00007ff6a99077a0(0000) GS:ffff88000390b000(0000) knlGS:0000000000000000 Mar 1 21:23:56 xencave kernel: [1840078.406820] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 1 21:23:56 xencave kernel: [1840078.406842] CR2: 00007ff6a9215876 CR3: 000000004af1a000 CR4: 0000000000002660 Mar 1 21:23:56 xencave kernel: [1840078.406878] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 1 21:23:56 xencave kernel: [1840078.406913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 1 21:23:56 xencave kernel: [1840078.406948] Process udevd (pid: 14525, threadinfo ffff88000df32000, task ffff8800513f2a60) Mar 1 21:23:56 xencave kernel: [1840078.406985] Stack: Mar 1 21:23:56 xencave kernel: [1840078.407003] 0000000000000000 00000000000af260 ffffea000030d2b8 000000000000df31 Mar 1 21:23:56 xencave kernel: [1840078.407042] <0> ffff8800028bb480 ffffffff810cd4e2 ffff88004af24ed0 00007ff6a9215876 Mar 1 21:23:56 xencave kernel: [1840078.407095] <0> 000000004af24000 ffffffff810cb394 ffff880002be3f18 00007ff6a9215876 Mar 1 21:23:56 xencave kernel: [1840078.407164] Call Trace: Mar 1 21:23:56 xencave kernel: [1840078.407187] [] ? __pte_alloc+0x6b/0xc6 Mar 1 21:23:56 xencave kernel: [1840078.407212] [] ? pmd_alloc+0x28/0x5b Mar 1 21:23:56 xencave kernel: [1840078.407235] [] ? handle_mm_fault+0xce/0x80f Mar 1 21:23:56 xencave kernel: [1840078.407262] [] ? page_fault+0x25/0x30 Mar 1 21:23:56 xencave kernel: [1840078.407287] [] ? error_exit+0x2a/0x60 Mar 1 21:23:56 xencave kernel: [1840078.407311] [] ? retint_restore_args+0x5/0x6 Mar 1 21:23:56 xencave kernel: [1840078.407336] [] ? do_page_fault+0x2e0/0x2fc Mar 1 21:23:56 xencave kernel: [1840078.407361] [] ? page_fault+0x25/0x30 Mar 1 21:23:56 xencave kernel: [1840078.407383] Code: ec 28 89 3c 24 48 89 f7 e8 a2 fd ff ff 48 89 e7 48 89 44 24 08 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 b0 cc ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 55 49 89 ca 48 89 d5 40 88 f1 48 89 Mar 1 21:23:56 xencave kernel: [1840078.407702] RIP [] pin_pagetable_pfn+0x2d/0x36 Mar 1 21:23:56 xencave kernel: [1840078.407731] RSP -- View this message in context: http://xen.1045712.n5.nabble.com/kernel-BUG-at-arch-x86-xen-mmu-c-1860-tp3318567p3414620.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-09 00:06 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Well, this is too bad. I encountered this bug when xen 4.0 was released, around the time development on 2.6.31 was halted. That is why i stuck with 2.6.31 when everyone else went with 2.6.32, because i determined 2.6.32 was not fit for duty and im guessing it still isnt today. The bug occures on 2.6.32 xen kernels ( maybe even newer ones) and is distribution unrelated, i was running debian 5.0 then, i am running 6.0 testing now and even have tried compiling all the userland stuff myself. This is error can be encountered during a number of different actions: 1.) any action with lvm (start, stop, create, delete) 2.) while starting multipathd (restarting too, of course) Sometimes the box only hangs there and no further device mapper interactions are possible. This is where i got my syslog entry from. Back in 2010 i had to serial console the server and stuff like that to see the whole error. my guess is everything one does with the device mapper can and will trigger this sooner or later. Does anybody have any kind of insight on what the problem may be? ------------ Here is my syslog part when i ran "/etc/init.d/multipath-tools restart": Mar 9 00:24:10 memoryana multipathd: mpatha: stop event checker thread (140606587918080) Mar 9 00:24:10 memoryana multipathd: mpathb: stop event checker thread (140606587885312) Mar 9 00:24:10 memoryana multipathd: mpathc: stop event checker thread (140606587852544) Mar 9 00:24:10 memoryana kernel: ------------[ cut here ]------------ Mar 9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872! Mar 9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP Mar 9 00:24:10 memoryana kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state Mar 9 00:24:10 memoryana kernel: CPU 1 Mar 9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin dm_multipath qla2xxx Mar 9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools Not tainted 2.6.32.28-xen0 #4 PowerEdge R610 Mar 9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>] [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 Mar 9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8 EFLAGS: 00010282 Mar 9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX: ffff8800cc4c3400 RCX: 0000000000000003 Mar 9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800c3101df8 Mar 9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08: 00003ffffffff000 R09: ffff880000000000 Mar 9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11: 0000000000000246 R12: 00000000000cc302 Mar 9 00:24:10 memoryana kernel: R13: 0000000000000000 R14: ffff8800c374cc60 R15: ffff8800c374cc60 Mar 9 00:24:10 memoryana kernel: FS: 00007f60add15700(0000) GS:ffff880028055000(0000) knlGS:0000000000000000 Mar 9 00:24:10 memoryana kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3: 00000000cef79000 CR4: 0000000000002660 Mar 9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 9 00:24:10 memoryana kernel: Process multipath-tools (pid: 10662, threadinfo ffff8800c3100000, task ffff8800cc01cbc0) Mar 9 00:24:10 memoryana kernel: Stack: Mar 9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302 ffff8800cc4c3400 ffff8800c3135b60 Mar 9 00:24:10 memoryana kernel: <0> 00000000000cc302 ffffffff810b0382 00007f60ad841876 ffff8800c30b4c10 Mar 9 00:24:10 memoryana kernel: <0> 00000000000100e0 0000000000000000 ffff8800c374cc60 ffffffff810b3595 Mar 9 00:24:10 memoryana kernel: Call Trace: Mar 9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ? __pte_alloc+0xf2/0x120 Mar 9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ? handle_mm_fault+0xa45/0xab0 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? page_fault+0x25/0x30 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ? error_exit+0x2a/0x60 Mar 9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ? retint_restore_args+0x5/0x6 Mar 9 00:24:10 memoryana kernel: [<ffffffff81038631>] ? do_page_fault+0x121/0x3c0 Mar 9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ? __put_user_4+0x1d/0x30 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? page_fault+0x25/0x30 Mar 9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89 3c 24 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89 f7 e8 6e f7 ff ff 48 83 f8 ff Mar 9 00:24:10 memoryana kernel: RIP [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 Mar 9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8> Mar 9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]--- Mar 9 00:24:11 memoryana multipathd: --------shut down------- -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-09 00:41 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
> > 4000000000000000: caf=8000000000000003 taf=4000000000000001 > > (XEN) mm.c:2825:d0 Error while installing new mfn 3b286 > > (XEN) mm.c:2794:d0 Mfn 35620 not pinned > > (XEN) mm.c:2794:d0 Mfn 3b286 not pinned > > > > Takes a bit of time to reproduce it, but I can reproduce it on my CentOS > > 5.5 OS. > > > > Yep... it takes sometime to hit this bug and if you run my test script > without any guest running will be faster to reproduce it. > > May I know what is your normal development platform/distribution for xen? IIt used to be Fedora Core 13, but then I switched to Ubuntu 10.10 and 11.04.> hope this isn''t just distribution specific issue for PVOPS > stable/xen-2.6.32.x Dom0... ... mostly not as from this thread alone I see > someone saying they hit this bug as well and I believe they are not running > CentOS 5.x... guess is debian based distribution.I am trying to eliminate this being a userspace tool version issue. I am running a similar test on a Fedora Core 13 box. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-09 00:43 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
On Tue, Mar 08, 2011 at 12:50:07PM -0800, tjaouen wrote:> > Teck Choon Giam wrote: > > > > > > I hope this isn''t just distribution specific issue for PVOPS > > stable/xen-2.6.32.x Dom0... ... mostly not as from this thread alone I see > > someone saying they hit this bug as well and I believe they are not > > running > > CentOS 5.x... guess is debian based distribution. > > > > > > Debian/Squeeze ""Stable"" > > Debian GNU/Linux 6.0 > Linux xencave 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12 05:46:49 UTC 2011 x86_64 > GNU/Linux > > > March 1, I enter this command: > > lvcreate -s -L 5G -n databank_snap /dev/vgraid1/databank > > .... and BUG in /var/log/syslog:Yup, looks like the same thing. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-09 06:58 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
I encountered this bug in 2010 when xen 4.0 was released, around the time development on 2.6.31 was halted. That is why i stuck with 2.6.31 when everyone else went with 2.6.32, because i determined 2.6.32 was not stable and im guessing it still isnt today. The bug occures on 2.6.32 xen kernels ( maybe even newer ones) and is distribution unrelated, i was running debian 5.0 then, i am running 6.0 testing now and even have tried compiling all the userland stuff myself. This is error can be encountered during a number of different actions: 1.) any action with lvm (start, stop, create, delete) 2.) while starting multipathd (restarting too, of course) Sometimes the box only hangs there and no further device interactions are possible. This is where i got my syslog entry from. Some other times, processes like vgchange just ... hang, for no particular reason. Back in 2010 i had to serial console the server and stuff like that to see the whole error. If you tried to use anything that did sth with device mapper, it would just ... hang. xm list for example. my guess is everything one does with the device mapper can and will trigger this sooner or later. Does anybody have any kind of insight on what the problem may be? ------------ Here is my syslog part when i ran "/etc/init.d/multipath-tools restart": Mar 9 00:24:10 memoryana multipathd: mpatha: stop event checker thread (140606587918080) Mar 9 00:24:10 memoryana multipathd: mpathb: stop event checker thread (140606587885312) Mar 9 00:24:10 memoryana multipathd: mpathc: stop event checker thread (140606587852544) Mar 9 00:24:10 memoryana kernel: ------------[ cut here ]------------ Mar 9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872! Mar 9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP Mar 9 00:24:10 memoryana kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state Mar 9 00:24:10 memoryana kernel: CPU 1 Mar 9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin dm_multipath qla2xxx Mar 9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools Not tainted 2.6.32.28-xen0 #4 PowerEdge R610 Mar 9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>] [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 Mar 9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8 EFLAGS: 00010282 Mar 9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX: ffff8800cc4c3400 RCX: 0000000000000003 Mar 9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800c3101df8 Mar 9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08: 00003ffffffff000 R09: ffff880000000000 Mar 9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11: 0000000000000246 R12: 00000000000cc302 Mar 9 00:24:10 memoryana kernel: R13: 0000000000000000 R14: ffff8800c374cc60 R15: ffff8800c374cc60 Mar 9 00:24:10 memoryana kernel: FS: 00007f60add15700(0000) GS:ffff880028055000(0000) knlGS:0000000000000000 Mar 9 00:24:10 memoryana kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3: 00000000cef79000 CR4: 0000000000002660 Mar 9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 9 00:24:10 memoryana kernel: Process multipath-tools (pid: 10662, threadinfo ffff8800c3100000, task ffff8800cc01cbc0) Mar 9 00:24:10 memoryana kernel: Stack: Mar 9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302 ffff8800cc4c3400 ffff8800c3135b60 Mar 9 00:24:10 memoryana kernel: <0> 00000000000cc302 ffffffff810b0382 00007f60ad841876 ffff8800c30b4c10 Mar 9 00:24:10 memoryana kernel: <0> 00000000000100e0 0000000000000000 ffff8800c374cc60 ffffffff810b3595 Mar 9 00:24:10 memoryana kernel: Call Trace: Mar 9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ? __pte_alloc+0xf2/0x120 Mar 9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ? handle_mm_fault+0xa45/0xab0 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? page_fault+0x25/0x30 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ? error_exit+0x2a/0x60 Mar 9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ? retint_restore_args+0x5/0x6 Mar 9 00:24:10 memoryana kernel: [<ffffffff81038631>] ? do_page_fault+0x121/0x3c0 Mar 9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ? __put_user_4+0x1d/0x30 Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? page_fault+0x25/0x30 Mar 9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89 3c 24 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89 f7 e8 6e f7 ff ff 48 83 f8 ff Mar 9 00:24:10 memoryana kernel: RIP [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 Mar 9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8> Mar 9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]--- Mar 9 00:24:11 memoryana multipathd: --------shut down------- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-09 15:00 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
On Wed, Mar 09, 2011 at 07:58:39AM +0100, Andreas Olsowski wrote:> > I encountered this bug in 2010 when xen 4.0 was released, around the > time development on 2.6.31 was halted.That is interesting data. Can you give more details on what 2.6.31 kernel and hypervisor you are using? Have you tried to rev the hypervisor up to Xen 4.1.0-rc7-pre for example?> > That is why i stuck with 2.6.31 when everyone else went with 2.6.32, > because i determined 2.6.32 was not stable and im guessing it still > isnt today. > > The bug occures on 2.6.32 xen kernels ( maybe even newer ones) andGood point. Let me check 2.6.38.> is distribution unrelated, i was running debian 5.0 then, i am > running 6.0 testing now and even have tried compiling all the > userland stuff myself.<nods>The overnight tests on Fedora Core 13 showed the same failure @ loop 97.> > This is error can be encountered during a number of different actions: > 1.) any action with lvm (start, stop, create, delete) > 2.) while starting multipathd (restarting too, of course) > > Sometimes the box only hangs there and no further device > interactions are possible. This is where i got my syslog entry from. > Some other times, processes like vgchange just ... hang, for no > particular reason. > > Back in 2010 i had to serial console the server and stuff like that > to see the whole error. > If you tried to use anything that did sth with device mapper, it > would just ... hang. xm list for example. > > > my guess is everything one does with the device mapper can and will > trigger this sooner or later. > > Does anybody have any kind of insight on what the problem may be?The problem is that when an user application dies, the page-table is discarded (which is a process of unpinning the pages) and we let the pages be re-used. When another application is launched we construct a new page-table (and pin the page-table) and when it exits we do the same thing. The issue is that during the construction (the application just forked) we encounter a page that used to belong to now a discarded page-table and Xen (rightly) tells us that we are trying to pin an already pinned page. Pinning here is the process of letting Xen inspect the pagetable so that it can assert that there are no machine addresses that point to the hypervisor or another guest. Back to the problem. Xen tell us that we are trying to pin an already pinned page _way_ after we had discarded (or thought we had) the old page-tables - so finding the culprit of _why_ we missed a page is difficult as it had happend in the past. Jeremy had gone over with a fine comb over the code that deals with construction/deconstruction and made sure there are no races but there is obviously something amiss.> > > ------------ > Here is my syslog part when i ran "/etc/init.d/multipath-tools restart": > > Mar 9 00:24:10 memoryana multipathd: mpatha: stop event checker > thread (140606587918080) > Mar 9 00:24:10 memoryana multipathd: mpathb: stop event checker > thread (140606587885312) > Mar 9 00:24:10 memoryana multipathd: mpathc: stop event checker > thread (140606587852544) > Mar 9 00:24:10 memoryana kernel: ------------[ cut here ]------------ > Mar 9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872! > Mar 9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP > Mar 9 00:24:10 memoryana kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state > Mar 9 00:24:10 memoryana kernel: CPU 1 > Mar 9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin > dm_multipath qla2xxx > Mar 9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools > Not tainted 2.6.32.28-xen0 #4 PowerEdge R610 > Mar 9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>] > [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 > Mar 9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8 > EFLAGS: 00010282 > Mar 9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX: > ffff8800cc4c3400 RCX: 0000000000000003 > Mar 9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI: > 0000000000000001 RDI: ffff8800c3101df8 > Mar 9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08: > 00003ffffffff000 R09: ffff880000000000 > Mar 9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11: > 0000000000000246 R12: 00000000000cc302 > Mar 9 00:24:10 memoryana kernel: R13: 0000000000000000 R14: > ffff8800c374cc60 R15: ffff8800c374cc60 > Mar 9 00:24:10 memoryana kernel: FS: 00007f60add15700(0000) > GS:ffff880028055000(0000) knlGS:0000000000000000 > Mar 9 00:24:10 memoryana kernel: CS: e033 DS: 0000 ES: 0000 CR0: > 000000008005003b > Mar 9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3: > 00000000cef79000 CR4: 0000000000002660 > Mar 9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Mar 9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Mar 9 00:24:10 memoryana kernel: Process multipath-tools (pid: > 10662, threadinfo ffff8800c3100000, task ffff8800cc01cbc0) > Mar 9 00:24:10 memoryana kernel: Stack: > Mar 9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302 > ffff8800cc4c3400 ffff8800c3135b60 > Mar 9 00:24:10 memoryana kernel: <0> 00000000000cc302 > ffffffff810b0382 00007f60ad841876 ffff8800c30b4c10 > Mar 9 00:24:10 memoryana kernel: <0> 00000000000100e0 > 0000000000000000 ffff8800c374cc60 ffffffff810b3595 > Mar 9 00:24:10 memoryana kernel: Call Trace: > Mar 9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ? > __pte_alloc+0xf2/0x120 > Mar 9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ? > handle_mm_fault+0xa45/0xab0 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? > page_fault+0x25/0x30 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ? > error_exit+0x2a/0x60 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ? > retint_restore_args+0x5/0x6 > Mar 9 00:24:10 memoryana kernel: [<ffffffff81038631>] ? > do_page_fault+0x121/0x3c0 > Mar 9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ? > __put_user_4+0x1d/0x30 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? > page_fault+0x25/0x30 > Mar 9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89 > 3c 24 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f > 00 00 e8 d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89 > f7 e8 6e f7 ff ff 48 83 f8 ff > Mar 9 00:24:10 memoryana kernel: RIP [<ffffffff8100d471>] > pin_pagetable_pfn+0x31/0x60 > Mar 9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8> > Mar 9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]--- > Mar 9 00:24:11 memoryana multipathd: --------shut down------- > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-09 19:59 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
> That is interesting data. Can you give more details on what 2.6.31 kernel> and hypervisor you are using? Have you tried to rev the hypervisor > up to Xen 4.1.0-rc7-pre for example? I can do you one more, here are links to, kernel tarball as built by me (2.6.31.14), its configuration and my xen4.0.0 directory. Tomorrow i will test 4.1.0 and 2.6.38, allthough i have to say, if one kernel works and the other doesnt, to me it seems like a kernel problem, not one of the hypervisor. Of course, since i dont know the inner workings of xen, that point of view might be ... flawed. It would be nice if you could define a set of parameters that could prove beneficial in provoking the bug as it would make it easier for me to test different scenarios. Or better yet a script that definitly will provoke it within a given timeframe. So far even restarting multipathd _could_ trigger it for me. It also occured during bootup and when it did it kept on happening until i did a cold restart of the server. Does that make sense? Does data remain in the memory modules when i reboot a system? (init 6) And if that is so, whats the "Scrubbing memory ....." for that i see when xen is loading? One thing i would like to verify is that the bug only occurs when running the kernel under xen and not when its running on its own. I cant quite remember if i tried that in 2010. Today i ran a loop of 300 lvcreate,snapshot,lvdelete on a standalone 2.6.32-xen0 kernel and did not receive an error. I didnt really have the time to try and catch it that way running under Xen. I will do that tomorrow. I will report my findings. best regards, -- Andreas Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-10 07:20 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
Whoops, forgot to put in the links. These, in combination, will not produce the error: http://141.39.208.101/linux-2.6.31.13-xen0.config http://141.39.208.101/linux-2.6.31.13-xen0.tar.bz2 http://141.39.208.101/xen-4.0.0.tar.bz2 -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-10 13:45 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
All xen 4.1.0 test were done on server1 (netcatarina). All but one test with xen 4.0.1 were made on server2 (memoryana). Why i had to rerun one of the test for server2 on server1 is explained below. Here are my test results: =====================================================Kernel 2.6.32.28 without XEN: about 50 successful runs of Teck Choon Giams "test.sh" script. (modified for handling 10 test volumes and sleeping 2 seconds) multipathd restarted succesfully s multipath module loaded/unloaded successfully lvm2 restarted successfully =====================================================Kernel 2.6.38 without XEN: about 20 successful runs of "test.sh" multipathd restarted succesfully s multipath module loaded/unloaded successfully lvm2 restarted successfully =====================================================Kernel 2.6.32.28 with XEN 4.0.1: at about loop 2 for volume 7 of "test.sh" it stopped doing ... well anything there has been no output on the screen and neitehr syslog nor dmesg entry. I left it hanging for about 15 Minutes until i decided to write this one off as a side effect of the same underlying problem. All lvm2 tools stopped working and i couldnt shut it down. Killing the hangig process ended it properly. I did a cold reset of the server, as i wanted to see the discussed BUG again. But i failed here. It would seem like my server2 has some kind of addressing error: pci 000:04:00.1: BAR 6: address space collision of device .... 0000:04:00.1: is one of my QLogic HBAs And since i use centralized FC storage ... who knows what side effects happened here. Interesting enough i had no problems with kernel 2.6.38 on this machine. So i downgraded server1 that did never show this message to xen 4.0.1 and ran the test: after 2 loops at volume 5 i hit "kernel BUG at arch/x86/xen/mmu.c" again. =====================================================Kernel 2.6.38 with XEN 4.0.1: 100 runs of test.sh without error multipathd restarted successfully multipath module loaded/unloaded successfully lvm2 stop/start ok =====================================================Kernel 2.6.32.28 with XEN 4.1.0-rc7: booted at first: crash afer only 5 iterations of "test.sh" http://pastebin.com/uNL7ehZ8 later, after having booted 2.6.38 on this server to test it with xen 4.1, i encountered different error at boottime: BUG: unable to handle kernel paging request at ffff8800cc3e5f48 Only have pictures of it: http://141.39.208.101/err1.png http://141.39.208.101/err2.png I then did a cold boot of the server, as this has proven to make it boot in the past. When this did not help, i stopped the test.sh running on my other server, because the hang came when lvm2 was started and the servers use shared storage. Apparently this helped, the server booted fine after another cold reset. After that i encountered an error again at loop 10 of "test.sh", but not with the "kernel BUG at arch/x86/xen/mmu.c", but again, with "BUG: unable to handle kernel paging request at ffff8800cc61ce010" http://141.39.208.101/err3.png http://141.39.208.101/err4.png =====================================================Kernel 2.6.38 with XEN 4.1.0-rc7: 100 runs of test.sh without error multipathd restarted successfully multipath module loaded/unloaded successfully lvm2 stop/start ok =====================================================Summary ===================================================== So thats two different errors i have encountered, one is the "kernel BUG at arch/x86/xen/mmu.c", the other is "BUG: unablte to handle kernel paging request" Both only apply to 2.6.32 when running under eitehr xen4.0.1 or 4.1. On its own the kernel works fine. Kernel 2.6.38 ran fine on both hypervisors as well as on its own. One other issue occured that i didnt expect: With the same .config (make oldconfig), 2.6.38 left my screen black after loading the kernel, on both hypervisors. The servers worked just fine, i just didnt see any output on their VGA ports. I hope this information helps you to hunt this bug down as it effectively makes the "default" Xen unusable in server situations where the device mapper is involved. It is puzzling to me why noone did notice it last year, am i the only one running xen on server hardware (Dell R610, 710 and 2950) with centralized storage (FibreChannel or iSCSI) and using it as environment for production. Is multipathing two links to a centralized storage and using LVM2 to split it up for virtual machines running on two or more servers really such a rare thing to find Xen running on? Btw, who is currently working on the remus implementation? If you should need any more testing from me, feel free to ask. Best regards. -- Andreas Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-11 18:05 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
> Both only apply to 2.6.32 when running under eitehr xen4.0.1 or 4.1.OK, so Xen hypervisor is not at fault here.> On its own the kernel works fine. > > Kernel 2.6.38 ran fine on both hypervisors as well as on its own.Great!> > One other issue occured that i didnt expect: > With the same .config (make oldconfig), 2.6.38 left my screen black > after loading the kernel, on both hypervisors.Um, that is surprising. If you have a radeon VGA driver, what happens if you do ''radeon.modeset=0''?> The servers worked just fine, i just didnt see any output on their > VGA ports. > > > I hope this information helps you to hunt this bug down as it > effectively makes the "default" Xen unusable in server situations > where the device mapper is involved. > > It is puzzling to me why noone did notice it last year, am i the > only one running xen on server hardware (Dell R610, 710 and 2950) > with centralized storage (FibreChannel or iSCSI) and using it as > environment for production. > > Is multipathing two links to a centralized storage and using LVM2 to > split it up for virtual machines running on two or more servers > really such a rare thing to find Xen running on?It is for kernel engineers. That equipment isn''t cheap. It is very difficult for us to fix bugs we don''t see on our equipment.> > Btw, who is currently working on the remus implementation?Frank Pan <frankpzh@gmail.com>> > > > If you should need any more testing from me, feel free to ask.Ok.> > Best regards. > > > -- > Andreas Olsowski >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hello, I can confirm this bug, I''m using very similar configuration - Debian Lenny/Squeeze on several servers connected to FC storage with enabled multipathing. One more thing - it was quite long ago, but I did not see this bug with Xen 4.0 pre-release versions with 2.6.31 kernel from Jeremy''s tree. I''m regularly updating my system to up-to-date version of Xen/PV-OPS kernel and this bug is here across whole development line from Xen 4.0/2.6.32 kernel until now:( Roman On Wed, Mar 09, 2011 at 01:06:39AM +0100, Andreas Olsowski wrote:> Well, this is too bad. > > I encountered this bug when xen 4.0 was released, around the time > development on 2.6.31 was halted. > > That is why i stuck with 2.6.31 when everyone else went with 2.6.32, > because i determined 2.6.32 was not fit for duty and im guessing it > still isnt today. > > The bug occures on 2.6.32 xen kernels ( maybe even newer ones) and is > distribution unrelated, i was running debian 5.0 then, i am running 6.0 > testing now and even have tried compiling all the userland stuff myself. > > This is error can be encountered during a number of different actions: > 1.) any action with lvm (start, stop, create, delete) > 2.) while starting multipathd (restarting too, of course) > > Sometimes the box only hangs there and no further device mapper > interactions are possible. This is where i got my syslog entry from. > > Back in 2010 i had to serial console the server and stuff like that to > see the whole error. > > > my guess is everything one does with the device mapper can and will > trigger this sooner or later. > > Does anybody have any kind of insight on what the problem may be? > > ------------ > Here is my syslog part when i ran "/etc/init.d/multipath-tools restart": > > Mar 9 00:24:10 memoryana multipathd: mpatha: stop event checker thread > (140606587918080) > Mar 9 00:24:10 memoryana multipathd: mpathb: stop event checker thread > (140606587885312) > Mar 9 00:24:10 memoryana multipathd: mpathc: stop event checker thread > (140606587852544) > Mar 9 00:24:10 memoryana kernel: ------------[ cut here ]------------ > Mar 9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872! > Mar 9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP > Mar 9 00:24:10 memoryana kernel: last sysfs file: > /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state > Mar 9 00:24:10 memoryana kernel: CPU 1 > Mar 9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin > dm_multipath qla2xxx > Mar 9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools Not > tainted 2.6.32.28-xen0 #4 PowerEdge R610 > Mar 9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>] > [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 > Mar 9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8 EFLAGS: > 00010282 > Mar 9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX: > ffff8800cc4c3400 RCX: 0000000000000003 > Mar 9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI: > 0000000000000001 RDI: ffff8800c3101df8 > Mar 9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08: > 00003ffffffff000 R09: ffff880000000000 > Mar 9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11: > 0000000000000246 R12: 00000000000cc302 > Mar 9 00:24:10 memoryana kernel: R13: 0000000000000000 R14: > ffff8800c374cc60 R15: ffff8800c374cc60 > Mar 9 00:24:10 memoryana kernel: FS: 00007f60add15700(0000) > GS:ffff880028055000(0000) knlGS:0000000000000000 > Mar 9 00:24:10 memoryana kernel: CS: e033 DS: 0000 ES: 0000 CR0: > 000000008005003b > Mar 9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3: > 00000000cef79000 CR4: 0000000000002660 > Mar 9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Mar 9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Mar 9 00:24:10 memoryana kernel: Process multipath-tools (pid: 10662, > threadinfo ffff8800c3100000, task ffff8800cc01cbc0) > Mar 9 00:24:10 memoryana kernel: Stack: > Mar 9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302 > ffff8800cc4c3400 ffff8800c3135b60 > Mar 9 00:24:10 memoryana kernel: <0> 00000000000cc302 ffffffff810b0382 > 00007f60ad841876 ffff8800c30b4c10 > Mar 9 00:24:10 memoryana kernel: <0> 00000000000100e0 0000000000000000 > ffff8800c374cc60 ffffffff810b3595 > Mar 9 00:24:10 memoryana kernel: Call Trace: > Mar 9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ? > __pte_alloc+0xf2/0x120 > Mar 9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ? > handle_mm_fault+0xa45/0xab0 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? > page_fault+0x25/0x30 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ? > error_exit+0x2a/0x60 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ? > retint_restore_args+0x5/0x6 > Mar 9 00:24:10 memoryana kernel: [<ffffffff81038631>] ? > do_page_fault+0x121/0x3c0 > Mar 9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ? > __put_user_4+0x1d/0x30 > Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? > page_fault+0x25/0x30 > Mar 9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89 3c 24 > 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 > d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89 f7 e8 6e f7 > ff ff 48 83 f8 ff > Mar 9 00:24:10 memoryana kernel: RIP [<ffffffff8100d471>] > pin_pagetable_pfn+0x31/0x60 > Mar 9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8> > Mar 9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]--- > Mar 9 00:24:11 memoryana multipathd: --------shut down------- > > -- > Andreas Olsowski > Leuphana Universität Lüneburg > Rechen- und Medienzentrum > Scharnhorststraße 1, C7.015 > 21335 Lüneburg > > Tel: ++49 4131 677 1309 > >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- ---------------------------------------------------------------------- ,''''`. [benco] | mailto: benco@acid.sk | silc: /msg benco : :'' : ------------------------------------------------------------- `. `'' GPG publickey: http://www.acid.sk/pubkey.asc `- KF = 0DF6 0592 74D2 F17A DACF A5C3 1720 CB7C F54C F429 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2011-Mar-11 19:59 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
Haven''t seen this one on my systen. I do use LVM, but i''m only using LVM as device mapper, without software raid etc. Also using debian lenny/squeeze together with custom compiled xen and xen-2.6.32.x pvops kernel So I would suspect the multipathing / raid part. -- Sander Friday, March 11, 2011, 7:38:00 PM, you wrote:> Hello,> I can confirm this bug, I''m using very similar configuration - Debian > Lenny/Squeeze on several servers connected to FC storage with enabled > multipathing.> One more thing - it was quite long ago, but I did not see this bug with Xen > 4.0 pre-release versions with 2.6.31 kernel from Jeremy''s tree. I''m > regularly updating my system to up-to-date version of Xen/PV-OPS kernel and > this bug is here across whole development line from Xen 4.0/2.6.32 kernel > until now:(> Roman> On Wed, Mar 09, 2011 at 01:06:39AM +0100, Andreas Olsowski wrote: >> Well, this is too bad. >> >> I encountered this bug when xen 4.0 was released, around the time >> development on 2.6.31 was halted. >> >> That is why i stuck with 2.6.31 when everyone else went with 2.6.32, >> because i determined 2.6.32 was not fit for duty and im guessing it >> still isnt today. >> >> The bug occures on 2.6.32 xen kernels ( maybe even newer ones) and is >> distribution unrelated, i was running debian 5.0 then, i am running 6.0 >> testing now and even have tried compiling all the userland stuff myself. >> >> This is error can be encountered during a number of different actions: >> 1.) any action with lvm (start, stop, create, delete) >> 2.) while starting multipathd (restarting too, of course) >> >> Sometimes the box only hangs there and no further device mapper >> interactions are possible. This is where i got my syslog entry from. >> >> Back in 2010 i had to serial console the server and stuff like that to >> see the whole error. >> >> >> my guess is everything one does with the device mapper can and will >> trigger this sooner or later. >> >> Does anybody have any kind of insight on what the problem may be? >> >> ------------ >> Here is my syslog part when i ran "/etc/init.d/multipath-tools restart": >> >> Mar 9 00:24:10 memoryana multipathd: mpatha: stop event checker thread >> (140606587918080) >> Mar 9 00:24:10 memoryana multipathd: mpathb: stop event checker thread >> (140606587885312) >> Mar 9 00:24:10 memoryana multipathd: mpathc: stop event checker thread >> (140606587852544) >> Mar 9 00:24:10 memoryana kernel: ------------[ cut here ]------------ >> Mar 9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872! >> Mar 9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP >> Mar 9 00:24:10 memoryana kernel: last sysfs file: >> /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state >> Mar 9 00:24:10 memoryana kernel: CPU 1 >> Mar 9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin >> dm_multipath qla2xxx >> Mar 9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools Not >> tainted 2.6.32.28-xen0 #4 PowerEdge R610 >> Mar 9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>] >> [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60 >> Mar 9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8 EFLAGS: >> 00010282 >> Mar 9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX: >> ffff8800cc4c3400 RCX: 0000000000000003 >> Mar 9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI: >> 0000000000000001 RDI: ffff8800c3101df8 >> Mar 9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08: >> 00003ffffffff000 R09: ffff880000000000 >> Mar 9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11: >> 0000000000000246 R12: 00000000000cc302 >> Mar 9 00:24:10 memoryana kernel: R13: 0000000000000000 R14: >> ffff8800c374cc60 R15: ffff8800c374cc60 >> Mar 9 00:24:10 memoryana kernel: FS: 00007f60add15700(0000) >> GS:ffff880028055000(0000) knlGS:0000000000000000 >> Mar 9 00:24:10 memoryana kernel: CS: e033 DS: 0000 ES: 0000 CR0: >> 000000008005003b >> Mar 9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3: >> 00000000cef79000 CR4: 0000000000002660 >> Mar 9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1: >> 0000000000000000 DR2: 0000000000000000 >> Mar 9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6: >> 00000000ffff0ff0 DR7: 0000000000000400 >> Mar 9 00:24:10 memoryana kernel: Process multipath-tools (pid: 10662, >> threadinfo ffff8800c3100000, task ffff8800cc01cbc0) >> Mar 9 00:24:10 memoryana kernel: Stack: >> Mar 9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302 >> ffff8800cc4c3400 ffff8800c3135b60 >> Mar 9 00:24:10 memoryana kernel: <0> 00000000000cc302 ffffffff810b0382 >> 00007f60ad841876 ffff8800c30b4c10 >> Mar 9 00:24:10 memoryana kernel: <0> 00000000000100e0 0000000000000000 >> ffff8800c374cc60 ffffffff810b3595 >> Mar 9 00:24:10 memoryana kernel: Call Trace: >> Mar 9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ? >> __pte_alloc+0xf2/0x120 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ? >> handle_mm_fault+0xa45/0xab0 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? >> page_fault+0x25/0x30 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ? >> error_exit+0x2a/0x60 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ? >> retint_restore_args+0x5/0x6 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff81038631>] ? >> do_page_fault+0x121/0x3c0 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ? >> __put_user_4+0x1d/0x30 >> Mar 9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ? >> page_fault+0x25/0x30 >> Mar 9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89 3c 24 >> 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 >> d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89 f7 e8 6e f7 >> ff ff 48 83 f8 ff >> Mar 9 00:24:10 memoryana kernel: RIP [<ffffffff8100d471>] >> pin_pagetable_pfn+0x31/0x60 >> Mar 9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8> >> Mar 9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]--- >> Mar 9 00:24:11 memoryana multipathd: --------shut down------- >> >> -- >> Andreas Olsowski >> Leuphana Universität Lüneburg >> Rechen- und Medienzentrum >> Scharnhorststraße 1, C7.015 >> 21335 Lüneburg >> >> Tel: ++49 4131 677 1309 >> >>>> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-11 20:29 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sat, Mar 12, 2011 at 3:59 AM, Sander Eikelenboom <linux@eikelenboom.it>wrote:> Haven''t seen this one on my systen. I do use LVM, but i''m only using LVM as > device mapper, without software raid etc. > Also using debian lenny/squeeze together with custom compiled xen and > xen-2.6.32.x pvops kernel > > So I would suspect the multipathing / raid part. >>From one of my test server, below is grep output:# grep -A 45 ''cut here'' /var/log/messages*|grep ''kernel: Process'' /var/log/messages-Mar 10 00:53:00 xen06 kernel: Process mpath_wait (pid: 9513, threadinfo ffff880010f38000, task ffff880010c4c100) /var/log/messages-Mar 10 02:52:33 xen06 kernel: Process clvmd (pid: 3240, threadinfo ffff880035a14000, task ffff88003b49c340) /var/log/messages-Mar 10 02:52:33 xen06 kernel: Process syslogd (pid: 24279, threadinfo ffff88003bb3e000, task ffff880016f3a1c0) /var/log/messages-Mar 10 03:40:52 xen06 kernel: Process mpath_wait (pid: 16487, threadinfo ffff88003becc000, task ffff88003bdb8480) /var/log/messages.2-Feb 23 22:56:56 xen06 kernel: Process mpath_wait (pid: 15222, threadinfo ffff8800265b8000, task ffff88003b940200) /var/log/messages.2-Feb 24 01:03:17 xen06 kernel: Process dmsetup (pid: 509, threadinfo ffff88003ced4000, task ffff88003025c4c0) /var/log/messages.2-Feb 26 23:20:22 xen06 kernel: Process mpath_wait (pid: 10154, threadinfo ffff880025604000, task ffff8800254804c0) /var/log/messages.4-Feb 8 04:30:10 xen06 kernel: Process mpath_wait (pid: 11646, threadinfo ffff88001e722000, task ffff880025222440) /var/log/messages.4-Feb 9 16:43:11 xen06 kernel: Process lvremove (pid: 24469, threadinfo ffff88003b930000, task ffff880025ab03c0) /var/log/messages.4-Feb 9 22:02:09 xen06 kernel: Process grep (pid: 10327, threadinfo ffff880025056000, task ffff880024814240) /var/log/messages.4-Feb 9 22:13:37 xen06 kernel: Process dmsetup (pid: 6287, threadinfo ffff88003cf64000, task ffff880025fe8740) /var/log/messages.4-Feb 9 22:16:55 xen06 kernel: Process clvmd (pid: 3204, threadinfo ffff88003ccb8000, task ffff88003a436440) Whereby output related to ''kernel: BUG: unable to handle kernel paging request at'': # grep -B 20 -A 50 ''kernel: BUG: unable to handle kernel paging request at'' /var/log/messages* /var/log/messages.2-Feb 24 00:59:13 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.2-Feb 24 00:59:14 xen06 lvm[2379]: Monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.2-Feb 24 00:59:14 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.2-Feb 24 00:59:14 xen06 kernel: EXT3 FS on dm-0, internal journal /var/log/messages.2-Feb 24 00:59:14 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.2-Feb 24 00:59:14 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.2-Feb 24 00:59:15 xen06 lvm[2379]: Monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.2-Feb 24 00:59:16 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.2-Feb 24 00:59:16 xen06 kernel: EXT3 FS on dm-0, internal journal /var/log/messages.2-Feb 24 00:59:16 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.2-Feb 24 00:59:16 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.2-Feb 24 00:59:17 xen06 lvm[2379]: Monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.2-Feb 24 00:59:17 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.2-Feb 24 00:59:17 xen06 kernel: EXT3 FS on dm-0, internal journal /var/log/messages.2-Feb 24 00:59:17 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.2-Feb 24 00:59:18 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.2-Feb 24 00:59:19 xen06 lvm[2379]: Monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: EXT3 FS on dm-0, internal journal /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.2:Feb 24 00:59:19 xen06 kernel: BUG: unable to handle kernel paging request at ffff88002561f3f8 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: PGD 1002067 PUD 1006067 PMD 1f1067 PTE 801000002561f065 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Oops: 0003 [#1] SMP /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: last sysfs file: /sys/block/dm-13/dev /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: CPU 1 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Modules linked in: dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom tg3 serio_raw libphy button tpm_tis tpm tpm_bios pcspkr shpchp i2c_i801 i2c_core iTCO_wdt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Pid: 16520, comm: udevd Not tainted 2.6.32.29-0.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RSP: e02b:ffff880025655b88 EFLAGS: 00010246 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RAX: 0000000000000000 RBX: ffff88002561f3f8 RCX: 00007f9a10c63000 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff88002561f3f8 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RBP: ffff880025655b98 R08: 00007f9a10fc0000 R09: ffff88003e64d609 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: R10: ffff88003cce22c0 R11: ffffffff813391a0 R12: 0000000000000000 /var/log/messages.2-Feb 24 00:59:19 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: R13: 00007f9a10000000 R14: ffff88003cc6a7f8 R15: ffff88002550c340 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: FS: 00007f9a10a41710(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 /var/log/messages.2-Feb 24 00:59:19 xen06 udevd-event[16460]: run_program: ''/sbin/dmsetup'' abnormal exit /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: CR2: ffff88002561f3f8 CR3: 000000003bb46000 CR4: 0000000000002660 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Process udevd (pid: 16520, threadinfo ffff880025654000, task ffff88003cce22c0) /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Stack: /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: ffff88002561f3f8 0000000025690067 ffff880025655c48 ffffffff810a60b0 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: <0> ffff880025655ac8 000000000000001b 00007f9a10e00000 0000000000000000 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: <0> 00007f9a10c63000 ffff88002807a5c0 00007f9a10c63000 ffff880025655cb8 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Call Trace: /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810a60b0>] free_pgd_range+0x3b3/0x401 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810a63b1>] free_pgtables+0xb6/0xce /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810a84a4>] exit_mmap+0xf0/0x14b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff81048888>] mmput+0x46/0xb2 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810c8cd9>] flush_old_exec+0x463/0x536 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810fbf99>] load_elf_binary+0x365/0x171b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810faa30>] ? load_misc_binary+0x61/0x322 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810a50dd>] ? get_user_pages+0x44/0x46 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810c795e>] ? put_arg_page+0x9/0xb /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810c7e0b>] search_binary_handler+0xde/0x268 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810c9208>] do_execve+0x193/0x262 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff810113af>] sys_execve+0x3e/0x58 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: [<ffffffff81012f1a>] stub_execve+0x6a/0xc0 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: Code: 8b 3c 25 70 c5 00 00 ff 14 25 68 a2 5c 81 48 83 c4 18 5b c9 c3 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 79 e3 ff ff 84 c0 75 05 <4c> 89 23 eb 0b 4c 89 e6 48 89 df e8 69 ff ff ff 5b 41 5c c9 c3 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: RSP <ffff880025655b88> /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: CR2: ffff88002561f3f8 /var/log/messages.2-Feb 24 00:59:19 xen06 kernel: ---[ end trace 5ea31e622470b518 ]--- /var/log/messages.2-Feb 24 00:59:20 xen06 lvm[2379]: Monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.2-Feb 24 00:59:21 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.2-Feb 24 00:59:21 xen06 kernel: EXT3 FS on dm-0, internal journal /var/log/messages.2-Feb 24 00:59:21 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.2-Feb 24 00:59:21 xen06 lvm[2379]: No longer monitoring snapshot XenGroup-testcrash5--snapshot -- /var/log/messages.4-Feb 8 03:58:30 xen06 lvm[3961]: Monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4-Feb 8 03:58:30 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 8 03:58:30 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 8 03:58:30 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 8 03:58:30 xen06 lvm[3961]: No longer monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4-Feb 8 03:58:31 xen06 lvm[3961]: Monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4-Feb 8 03:58:31 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 8 03:58:31 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 8 03:58:31 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 8 03:58:32 xen06 lvm[3961]: No longer monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4-Feb 8 03:58:33 xen06 lvm[3961]: Monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 8 03:58:33 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 8 03:58:33 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 8 03:58:33 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 8 03:58:33 xen06 lvm[3961]: No longer monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 8 03:58:34 xen06 lvm[3961]: Monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 8 03:58:34 xen06 lvm[3961]: No longer monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4:Feb 8 03:58:34 xen06 kernel: BUG: unable to handle kernel paging request at ffff880030081010 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: PGD 1002067 PUD 1006067 PMD 246067 PTE 8010000030081065 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Oops: 0003 [#1] SMP /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: CPU 3 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 button libphy tpm_tis tpm tpm_bios i2c_i801 pcspkr i2c_core iTCO_wdt shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Pid: 32051, comm: mpath_wait Not tainted 2.6.32.28-0.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RSP: e02b:ffff880030c19b88 EFLAGS: 00010246 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RAX: 0000000000000000 RBX: ffff880030081010 RCX: 00000000008c3000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff880030081010 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RBP: ffff880030c19b98 R08: 0000000002150000 R09: ffff88003e6f3e89 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: R10: ffff880030e78040 R11: ffffffff813381a0 R12: 0000000000000000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: R13: 0000000000600000 R14: ffff880036dc5000 R15: ffff880030994000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: FS: 00007f147d9606e0(0000) GS:ffff8800280a6000(0000) knlGS:0000000000000000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: CR2: ffff880030081010 CR3: 000000003ba55000 CR4: 0000000000002660 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Process mpath_wait (pid: 32051, threadinfo ffff880030c18000, task ffff88003c93c280) /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Stack: /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: ffff880030081010 000000003b7e4067 ffff880030c19c48 ffffffff810a5e32 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: <0> ffff880030c19ac8 000000000000001f 0000000002000000 0000000000000000 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: <0> 00000000008c3000 ffff8800280b45c0 00000000008c3000 ffff880030c19cb8 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Call Trace: /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810a5e32>] free_pgd_range+0x3b3/0x401 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810a6133>] free_pgtables+0xb6/0xce /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810a8224>] exit_mmap+0xf0/0x14b /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff81048648>] mmput+0x46/0xb2 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810c8a56>] flush_old_exec+0x460/0x533 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810fbd15>] load_elf_binary+0x365/0x171c /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810fa7ac>] ? load_misc_binary+0x61/0x322 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810a4e5f>] ? get_user_pages+0x44/0x46 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810c76de>] ? put_arg_page+0x9/0xb /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810c7b8b>] search_binary_handler+0xde/0x268 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810c8f85>] do_execve+0x193/0x262 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff810113af>] sys_execve+0x3e/0x58 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: [<ffffffff81012f1a>] stub_execve+0x6a/0xc0 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: Code: 8b 3c 25 70 c5 00 00 ff 14 25 68 82 5c 81 48 83 c4 18 5b c9 c3 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 79 e3 ff ff 84 c0 75 05 <4c> 89 23 eb 0b 4c 89 e6 48 89 df e8 69 ff ff ff 5b 41 5c c9 c3 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: RSP <ffff880030c19b88> /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: CR2: ffff880030081010 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: ---[ end trace f2d7fcc13c4cdf6a ]--- /var/log/messages.4:Feb 8 03:58:34 xen06 kernel: BUG: unable to handle kernel paging request at ffff8800313e1b00 /var/log/messages.4-Feb 8 03:58:34 xen06 kernel: IP: [<ffffffff810377f2>] ptep_set_access_flags+0x27/0x4a /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: PGD 1002067 PUD 1006067 PMD 24f067 PTE 80100000313e1065 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Oops: 0003 [#2] SMP /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: CPU 1 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp gfs2 dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 button libphy tpm_tis tpm tpm_bios i2c_i801 pcspkr i2c_core iTCO_wdt shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Pid: 32053, comm: mpath_wait Tainted: G D 2.6.32.28-0.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RIP: e030:[<ffffffff810377f2>] [<ffffffff810377f2>] ptep_set_access_flags+0x27/0x4a /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RSP: e02b:ffff8800307c7cb8 EFLAGS: 00210202 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RAX: ffff8800304b4870 RBX: 00007f147d960770 RCX: 80000001f6d6f167 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RDX: ffff8800313e1b00 RSI: 00007f147d960770 RDI: ffff8800304b4870 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RBP: ffff8800307c7cd8 R08: 0000000000000001 R09: ffffea0000ac5948 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: R10: 3138666666666666 R11: 000000000002496a R12: 0000000000000001 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: R13: ffff8800304b4870 R14: 0000000000000008 R15: 0000000000000000 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: FS: 00007f147d9606e0(0000) GS:ffff88002806c000(0000) knlGS:0000000000000000 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: CR2: ffff8800313e1b00 CR3: 0000000031261000 CR4: 0000000000002660 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Process mpath_wait (pid: 32053, threadinfo ffff8800307c6000, task ffff880030ece200) /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Stack: /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: 80000001f6d6f145 0000000000000000 ffffea0000ac5948 ffffea0000800f30 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: <0> ffff8800307c7d88 ffffffff810a2a19 0000000000000001 ffff8800307c7de8 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: <0> ffffffff8100f0b1 ffff8800307c7d98 ffff8800305a8f60 ffff8800313e1b00 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Call Trace: /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff810a2a19>] do_wp_page+0x2fd/0x717 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8100f0b1>] ? xen_force_evtchn_callback+0xd/0xf /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8100cde0>] ? xen_pte_val+0x64/0x6e /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8100c3e9>] ? __raw_callee_save_xen_pte_val+0x15/0x23 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff810a4933>] handle_mm_fault+0x84a/0x8b9 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8131be2d>] do_page_fault+0x252/0x2e2 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8100f7f2>] ? check_events+0x12/0x20 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff81319db5>] page_fault+0x25/0x30 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff8116dc9d>] ? __put_user_4+0x1d/0x30 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff81047d2c>] ? schedule_tail+0x98/0x9d /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: [<ffffffff810129c3>] ret_from_fork+0x13/0x80 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: Code: 41 5d c9 c3 55 48 89 e5 41 55 49 89 fd 41 54 45 31 e4 53 48 89 f3 48 83 ec 08 48 39 0a 41 0f 95 c4 45 85 e4 74 1d 45 85 c0 74 18 <48> 89 0a 48 8b 3f 0f 1f 80 00 00 00 00 48 89 de 4c 89 ef e8 37 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RIP [<ffffffff810377f2>] ptep_set_access_flags+0x27/0x4a /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: RSP <ffff8800307c7cb8> /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: CR2: ffff8800313e1b00 /var/log/messages.4-Feb 8 03:58:35 xen06 kernel: ---[ end trace f2d7fcc13c4cdf6b ]--- /var/log/messages.4-Feb 8 04:26:35 xen06 syslogd 1.4.1: restart. /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: klogd 1.4.1, log source /proc/kmsg started. /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: Linux version 2.6.32.28-0.xen.pvops.choon.centos5 (mockbuild@builder5.choon.net) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Feb 8 03:26:17 SGT 2011 /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: Command line: ro root=/dev/md1 panic=5 panic_timeout=5 /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: KERNEL supported cpus: /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: Intel GenuineIntel /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: AMD AuthenticAMD /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: Centaur CentaurHauls /var/log/messages.4-Feb 8 04:26:35 xen06 kernel: released 0 pages of unused memory -- /var/log/messages.4-Feb 9 07:44:11 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 9 07:44:11 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 07:44:11 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 07:44:11 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 07:44:11 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 9 07:44:12 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.4-Feb 9 07:44:12 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 07:44:12 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 07:44:12 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 07:44:12 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.4-Feb 9 07:44:13 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4-Feb 9 07:44:13 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 07:44:13 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 07:44:13 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 07:44:13 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4-Feb 9 07:44:14 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4-Feb 9 07:44:14 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 07:44:15 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 07:44:15 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 07:44:15 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4:Feb 9 07:53:59 xen06 kernel: BUG: unable to handle kernel paging request at ffff88003cc0c010 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: PGD 1002067 PUD 1006067 PMD 2ac067 PTE 801000003cc0c065 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Oops: 0003 [#1] SMP /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: CPU 0 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Modules linked in: dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 libphy button tpm_tis tpm tpm_bios pcspkr i2c_i801 shpchp i2c_core iTCO_wdt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Pid: 11005, comm: mpath_wait Not tainted 2.6.32.28-0.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RSP: e02b:ffff88003b7d3b88 EFLAGS: 00010246 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RAX: 0000000000000000 RBX: ffff88003cc0c010 RCX: 00000000008c3000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff88003cc0c010 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RBP: ffff88003b7d3b98 R08: 0000000001244000 R09: ffff88003f009f09 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: R10: ffff8800269e3ac0 R11: ffffffff813381a0 R12: 0000000000000000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: R13: 0000000000600000 R14: ffff8800269ca000 R15: ffff8800268bb000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: FS: 00007fa7f8de76e0(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: CR2: ffff88003cc0c010 CR3: 00000000306b0000 CR4: 0000000000002660 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Process mpath_wait (pid: 11005, threadinfo ffff88003b7d2000, task ffff8800307aa480) /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Stack: /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: ffff88003cc0c010 000000003cfee067 ffff88003b7d3c48 ffffffff810a5e32 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: <0> 000000000000e02b 0000000000000060 0000000001200000 0000000000000000 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: <0> 00000000008c3000 ffff88002805d5c0 00000000008c3000 ffff88003b7d3cb8 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Call Trace: /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810a5e32>] free_pgd_range+0x3b3/0x401 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810a6133>] free_pgtables+0xb6/0xce /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810a8224>] exit_mmap+0xf0/0x14b /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff81048648>] mmput+0x46/0xb2 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810c8a56>] flush_old_exec+0x460/0x533 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810fbd15>] load_elf_binary+0x365/0x171c /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810fa7ac>] ? load_misc_binary+0x61/0x322 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810a4e5f>] ? get_user_pages+0x44/0x46 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810c76de>] ? put_arg_page+0x9/0xb /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810c7b8b>] search_binary_handler+0xde/0x268 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810c8f85>] do_execve+0x193/0x262 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff810113af>] sys_execve+0x3e/0x58 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: [<ffffffff81012f1a>] stub_execve+0x6a/0xc0 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: Code: 8b 3c 25 70 c5 00 00 ff 14 25 68 82 5c 81 48 83 c4 18 5b c9 c3 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 79 e3 ff ff 84 c0 75 05 <4c> 89 23 eb 0b 4c 89 e6 48 89 df e8 69 ff ff ff 5b 41 5c c9 c3 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: RSP <ffff88003b7d3b88> /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: CR2: ffff88003cc0c010 /var/log/messages.4-Feb 9 07:53:59 xen06 kernel: ---[ end trace b5fffc59f6c8f974 ]--- /var/log/messages.4-Feb 9 16:31:21 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 9 16:31:21 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:31:21 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:31:21 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:31:21 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 9 16:31:22 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 9 16:31:22 xen06 kernel: kjournald starting. Commit interval 5 seconds -- /var/log/messages.4-Feb 9 16:35:34 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 9 16:35:34 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:35:34 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:35:34 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:35:35 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash1--snapshot /var/log/messages.4-Feb 9 16:35:36 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 9 16:35:36 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:35:36 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:35:36 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:35:36 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash2--snapshot /var/log/messages.4-Feb 9 16:35:37 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.4-Feb 9 16:35:37 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:35:37 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:35:37 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:35:37 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash3--snapshot /var/log/messages.4-Feb 9 16:35:38 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4-Feb 9 16:35:38 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:35:39 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash4--snapshot /var/log/messages.4:Feb 9 16:35:39 xen06 kernel: BUG: unable to handle kernel paging request at ffff880026b00418 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: PGD 1002067 PUD 1006067 PMD 1fb067 PTE 8010000026b00065 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Oops: 0003 [#2] SMP /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: CPU 2 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Modules linked in: dlm configfs xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport sg ide_cd_mod cdrom serio_raw tg3 libphy button tpm_tis tpm tpm_bios pcspkr i2c_i801 shpchp i2c_core iTCO_wdt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Pid: 25953, comm: udevd Tainted: G D 2.6.32.28-0.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RSP: e02b:ffff88002682fb88 EFLAGS: 00010246 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RAX: 0000000000000000 RBX: ffff880026b00418 RCX: 00007f71114ac000 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RDX: ffffea0000000000 RSI: 0000000000000000 RDI: ffff880026b00418 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RBP: ffff88002682fb98 R08: 00007f7112d14000 R09: ffff88003e6e5a89 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: R10: ffff88003d2d0580 R11: ffffffff813381a0 R12: 0000000000000000 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: R13: 00007f7110800000 R14: ffff880026b047f0 R15: ffff880025943e20 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: FS: 00007f711128a710(0000) GS:ffff880028089000(0000) knlGS:0000000000000000 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: CR2: ffff880026b00418 CR3: 0000000026946000 CR4: 0000000000002660 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Process udevd (pid: 25953, threadinfo ffff88002682e000, task ffff8800269ae380) /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Stack: /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: ffff880026b00418 000000003d32d067 ffff88002682fc48 ffffffff810a5e32 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: <0> ffff88002682fac8 000000000000001c 00007f7112c00000 0000000000000000 /var/log/messages.4-Feb 9 16:35:39 xen06 udevd-event[25941]: run_program: ''/sbin/dmsetup'' abnormal exit /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: <0> 00007f71114ac000 ffff8800280975c0 00007f71114ac000 ffff88002682fcb8 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Call Trace: /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810a5e32>] free_pgd_range+0x3b3/0x401 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810a6133>] free_pgtables+0xb6/0xce /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810a8224>] exit_mmap+0xf0/0x14b /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff81048648>] mmput+0x46/0xb2 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810c8a56>] flush_old_exec+0x460/0x533 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810fbd15>] load_elf_binary+0x365/0x171c /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810fa7ac>] ? load_misc_binary+0x61/0x322 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810a4e5f>] ? get_user_pages+0x44/0x46 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810c76de>] ? put_arg_page+0x9/0xb /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810c7b8b>] search_binary_handler+0xde/0x268 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810c8f85>] do_execve+0x193/0x262 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff810113af>] sys_execve+0x3e/0x58 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: [<ffffffff81012f1a>] stub_execve+0x6a/0xc0 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: Code: 8b 3c 25 70 c5 00 00 ff 14 25 68 82 5c 81 48 83 c4 18 5b c9 c3 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 79 e3 ff ff 84 c0 75 05 <4c> 89 23 eb 0b 4c 89 e6 48 89 df e8 69 ff ff ff 5b 41 5c c9 c3 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: RSP <ffff88002682fb88> /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: CR2: ffff880026b00418 /var/log/messages.4-Feb 9 16:35:39 xen06 kernel: ---[ end trace b5fffc59f6c8f975 ]--- /var/log/messages.4-Feb 9 16:35:40 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4-Feb 9 16:35:40 xen06 kernel: kjournald starting. Commit interval 5 seconds /var/log/messages.4-Feb 9 16:35:40 xen06 kernel: EXT3 FS on dm-11, internal journal /var/log/messages.4-Feb 9 16:35:40 xen06 kernel: EXT3-fs: mounted filesystem with ordered data mode. /var/log/messages.4-Feb 9 16:35:40 xen06 lvm[3802]: No longer monitoring snapshot XenGroup-testcrash5--snapshot /var/log/messages.4-Feb 9 16:35:41 xen06 lvm[3802]: Monitoring snapshot XenGroup-testcrash1--snapshot Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-11 20:45 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sat, Mar 12, 2011 at 3:59 AM, Sander Eikelenboom <linux@eikelenboom.it>wrote:> Haven''t seen this one on my systen. I do use LVM, but i''m only using LVM as > device mapper, without software raid etc. > Also using debian lenny/squeeze together with custom compiled xen and > xen-2.6.32.x pvops kernel > > So I would suspect the multipathing / raid part. >If you do not use LVM snapshot create/delete then you mostly won''t hit this BUG. LVM create/delete might but very rare but I hit this before... so unlucky for me :( Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2011-Mar-11 21:02 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
You did have some test script for creating/delete snapshots that triggers it ? Could try to test it with 2.6.38 as dom0 now, i you could resend that. Friday, March 11, 2011, 9:45:31 PM, you wrote:> On Sat, Mar 12, 2011 at 3:59 AM, Sander Eikelenboom <linux@eikelenboom.it>wrote:>> Haven''t seen this one on my systen. I do use LVM, but i''m only using LVM as >> device mapper, without software raid etc. >> Also using debian lenny/squeeze together with custom compiled xen and >> xen-2.6.32.x pvops kernel >> >> So I would suspect the multipathing / raid part. >>> If you do not use LVM snapshot create/delete then you mostly won''t hit this > BUG. LVM create/delete might but very rare but I hit this before... so > unlucky for me :(> Thanks.> Kindest regards, > Giam Teck Choon-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-11 21:15 UTC
Re: [Xen-devel] kernel BUG at arch/x86/xen/mmu.c:1860!
On Sat, Mar 12, 2011 at 5:02 AM, Sander Eikelenboom <linux@eikelenboom.it>wrote:> You did have some test script for creating/delete snapshots that triggers > it ? >Yes, I do and it is posted in this thread.> > Could try to test it with 2.6.38 as dom0 now, i you could resend that. >According to Andreas Olsowski testing, 2.6.38 doesn''t suffer this. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-14 10:25 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
On Fri, 2011-03-11 at 18:05 +0000, Konrad Rzeszutek Wilk wrote:> > Btw, who is currently working on the remus implementation? > > Frank Pan <frankpzh@gmail.com>Interesting, I didn''t know Frank was working on Remus stuff (I thought he was looking at more general suspend/restore/migrate stuff). However Shriram Rajagopalan <rshriram@cs.ubc.ca> recently stepped up to be Remus maintainer. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-14 10:36 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
Hi, Just in case anyone need my testcrash.sh/test.sh script, attached is my latest version. For usage, simply execute testcrash.sh without any option. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-16 15:52 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
> =====================================================> Kernel 2.6.32.28 without XEN: > about 50 successful runs of Teck Choon Giams "test.sh" script. > (modified for handling 10 test volumes and sleeping 2 seconds) > multipathd restarted succesfully s > multipath module loaded/unloaded successfully > lvm2 restarted successfully.. snip..> =====================================================> Kernel 2.6.32.28 with XEN 4.0.1: > at about loop 2 for volume 7 of "test.sh" it stopped doing ... well anything > there has been no output on the screen and neitehr syslog nor dmesg entry. > I left it hanging for about 15 Minutes until i decided to write this > one off as a side effect of the same underlying problem. > All lvm2 tools stopped working and i couldnt shut it down. > Killing the hangig process ended it properly.Jeremy and I were brainstorming this yesterday and couple of things that we thought might be interesting are to: - turn on CONFIG_DEBUG_PAGEALLOC - turn on CONFIG_DEBUG_LIST - turn on CONFIG_DEBUG_KMEMLEAK - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG - turn on CONFIG_SLUB_DEBUG_ON And see if anything starts coming out. Also looking in the changes for the drivers/dm/ between 2.6.32 and 2.6.38 and see if we just hitting some memory leak bugs that hadn''t been back-ported. (Still busy with the upstream effort, can''t work on this). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-16 16:26 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Wed, Mar 16, 2011 at 11:52 PM, Konrad Rzeszutek Wilk < konrad.wilk@oracle.com> wrote:> > =====================================================> > Kernel 2.6.32.28 without XEN: > > about 50 successful runs of Teck Choon Giams "test.sh" script. > > (modified for handling 10 test volumes and sleeping 2 seconds) > > multipathd restarted succesfully s > > multipath module loaded/unloaded successfully > > lvm2 restarted successfully > .. snip.. > > =====================================================> > Kernel 2.6.32.28 with XEN 4.0.1: > > at about loop 2 for volume 7 of "test.sh" it stopped doing ... well > anything > > there has been no output on the screen and neitehr syslog nor dmesg > entry. > > I left it hanging for about 15 Minutes until i decided to write this > > one off as a side effect of the same underlying problem. > > All lvm2 tools stopped working and i couldnt shut it down. > > Killing the hangig process ended it properly. > > Jeremy and I were brainstorming this yesterday and couple of things > that we thought might be interesting are to: > > - turn on CONFIG_DEBUG_PAGEALLOC > - turn on CONFIG_DEBUG_LIST > - turn on CONFIG_DEBUG_KMEMLEAK > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG > - turn on CONFIG_SLUB_DEBUG_ON > > And see if anything starts coming out. >Thanks a lot for both of you spending time to do so. It isn''t easy as I believe this is something related to kernel 2.6.32.x and just wondering is there something related to *sched_domains? I read recent mails in LKML about rebuild_sched_domains consider dangerous issues... and that is about recent kernels but won''t know what recent kernels that refer to... ... I will do those config changes in one of my test server when time permit and will post results/output here when done.> > Also looking in the changes for the drivers/dm/ between 2.6.32 and 2.6.38 > and > see if we just hitting some memory leak bugs that hadn''t been back-ported. > > (Still busy with the upstream effort, can''t work on this). >You mean 2.6.39 merge window? Once again, thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-16 16:40 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
> > - turn on CONFIG_DEBUG_PAGEALLOC > > - turn on CONFIG_DEBUG_LIST > > - turn on CONFIG_DEBUG_KMEMLEAK > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG > > - turn on CONFIG_SLUB_DEBUG_ON > > > > And see if anything starts coming out. > > > > Thanks a lot for both of you spending time to do so. It isn''t easy as I > believe this is something related to kernel 2.6.32.x and just wondering is > there something related to *sched_domains? I read recent mails in LKMLHmmm.. no idea.> about rebuild_sched_domains consider dangerous issues... and that is about > recent kernels but won''t know what recent kernels that refer to... ... > > I will do those config changes in one of my test server when time permit and > will post results/output here when done.OK. Thank you!> > > > > > Also looking in the changes for the drivers/dm/ between 2.6.32 and 2.6.38 > > and > > see if we just hitting some memory leak bugs that hadn''t been back-ported. > > > > (Still busy with the upstream effort, can''t work on this). > > > > You mean 2.6.39 merge window?Yes. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Mar-24 11:57 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Wed, Mar 16, 2011 at 12:40:01PM -0400, Konrad Rzeszutek Wilk wrote:> > > - turn on CONFIG_DEBUG_PAGEALLOC > > > - turn on CONFIG_DEBUG_LIST > > > - turn on CONFIG_DEBUG_KMEMLEAK > > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG > > > - turn on CONFIG_SLUB_DEBUG_ON > > > > > > And see if anything starts coming out. > > > > > > > Thanks a lot for both of you spending time to do so. It isn''t easy as I > > believe this is something related to kernel 2.6.32.x and just wondering is > > there something related to *sched_domains? I read recent mails in LKML > > Hmmm.. no idea. > > about rebuild_sched_domains consider dangerous issues... and that is about > > recent kernels but won''t know what recent kernels that refer to... ... > > > > I will do those config changes in one of my test server when time permit and > > will post results/output here when done. > > OK. Thank you!I''ve been using Jermey''s latest tree: 2.6.32.32 (there is even a 2.6.32.33) and I can''t hit this bug anymore. Would appreciate your input if you still see this. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-24 21:28 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Thu, Mar 24, 2011 at 7:57 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Wed, Mar 16, 2011 at 12:40:01PM -0400, Konrad Rzeszutek Wilk wrote: >> > > - turn on CONFIG_DEBUG_PAGEALLOC >> > > - turn on CONFIG_DEBUG_LIST >> > > - turn on CONFIG_DEBUG_KMEMLEAK >> > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG >> > > - turn on CONFIG_SLUB_DEBUG_ON >> > > >> > > And see if anything starts coming out. >> > > >> > >> > Thanks a lot for both of you spending time to do so. It isn''t easy as I >> > believe this is something related to kernel 2.6.32.x and just wondering is >> > there something related to *sched_domains? I read recent mails in LKML >> >> Hmmm.. no idea. >> > about rebuild_sched_domains consider dangerous issues... and that is about >> > recent kernels but won''t know what recent kernels that refer to... ... >> > >> > I will do those config changes in one of my test server when time permit and >> > will post results/output here when done. >> >> OK. Thank you! > > I''ve been using Jermey''s latest tree: 2.6.32.32 (there is even a 2.6.32.33) > and I can''t hit this bug anymore. Would appreciate your input if you still see this. >I still able to unfortunately :( git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git git branch = xen/next-2.6.32 git commit = ba5fcb8b8ac91d8b65c8a9c2545bf3d416fdc151 I will be compiling the latest commit 2.6.32.35 and try... ... maybe I am crazy... it doesn''t satisfy me if past one cycle of 100 loop test. In fact, I will run with loop 1000 a few times and this can take days. XenLinux aka linux-2.6.18-xen.hg tree pass this test though... ... unfortunately there are issues if I insist to use this old kernel due to drivers etc. such as e1000e, ahci... ... although now in production I use this xenkernel but need manually patch drivers etc... pain in ass :p Sorry, still haven''t find time to change the configuration to those and test and report back though as those build/compilation/packaging are all via handled by my customized script. I will try to do so about the configuration change and test this weekend if possible. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-25 03:57 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Thu, Mar 24, 2011 at 7:57 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Wed, Mar 16, 2011 at 12:40:01PM -0400, Konrad Rzeszutek Wilk wrote: >> > > - turn on CONFIG_DEBUG_PAGEALLOC >> > > - turn on CONFIG_DEBUG_LIST >> > > - turn on CONFIG_DEBUG_KMEMLEAK >> > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG >> > > - turn on CONFIG_SLUB_DEBUG_ON >> > > >> > > And see if anything starts coming out. >> > > >> > >> > Thanks a lot for both of you spending time to do so. It isn''t easy as I >> > believe this is something related to kernel 2.6.32.x and just wondering is >> > there something related to *sched_domains? I read recent mails in LKML >> >> Hmmm.. no idea. >> > about rebuild_sched_domains consider dangerous issues... and that is about >> > recent kernels but won''t know what recent kernels that refer to... ... >> > >> > I will do those config changes in one of my test server when time permit and >> > will post results/output here when done. >> >> OK. Thank you! > > I''ve been using Jermey''s latest tree: 2.6.32.32 (there is even a 2.6.32.33) > and I can''t hit this bug anymore. Would appreciate your input if you still see this. >This is a report back to you that unfortunately I still able to hit this bug :( git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git git branch = xen/next-2.6.32 git commit = 4306ea8f6db3d83a5a2bbfe5448dd78e6846475a Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-27 10:16 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Fri, Mar 25, 2011 at 11:57 AM, Teck Choon Giam <giamteckchoon@gmail.com> wrote:> On Thu, Mar 24, 2011 at 7:57 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> On Wed, Mar 16, 2011 at 12:40:01PM -0400, Konrad Rzeszutek Wilk wrote: >>> > > - turn on CONFIG_DEBUG_PAGEALLOC >>> > > - turn on CONFIG_DEBUG_LIST >>> > > - turn on CONFIG_DEBUG_KMEMLEAK >>> > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG >>> > > - turn on CONFIG_SLUB_DEBUG_ON >>> > > >>> > > And see if anything starts coming out. >>> > > >>> > >>> > Thanks a lot for both of you spending time to do so. It isn''t easy as I >>> > believe this is something related to kernel 2.6.32.x and just wondering is >>> > there something related to *sched_domains? I read recent mails in LKML >>> >>> Hmmm.. no idea. >>> > about rebuild_sched_domains consider dangerous issues... and that is about >>> > recent kernels but won''t know what recent kernels that refer to... ... >>> > >>> > I will do those config changes in one of my test server when time permit and >>> > will post results/output here when done. >>> >>> OK. Thank you! >> >> I''ve been using Jermey''s latest tree: 2.6.32.32 (there is even a 2.6.32.33) >> and I can''t hit this bug anymore. Would appreciate your input if you still see this. >> > > This is a report back to you that unfortunately I still able to hit this bug :( > > git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git > git branch = xen/next-2.6.32 > git commit = 4306ea8f6db3d83a5a2bbfe5448dd78e6846475a > > Thanks. > > Kindest regards, > Giam Teck Choon >Maybe this is good news ;) This is my report about various suggested kernel configuration options suggested by Konrad and Jeremy. I think I caught the cause or prevent this same BUG from happening so that Konrad or Jeremy have fewer place to look into it. Sorry, this will be little lengthy and sorry for my poor English. I am using the following: git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git git branch = xen/next-2.6.32 git commit = df3a5560166da5a05de93f2fc36b718cc43c6c3c hg_root = http://xenbits.xensource.com/xen-4.0-testing.hg hg_changeset = 21465 With my old kernel config, I still hit this BUG with testcrash loop 100 easily. In fact, with loop below 30 I mostly will hit this same BUG. My two test servers setup with at least 20 x 5GB LV. So each loop cycle will have at least 20 lvcreate/lvremove snapshots/mount/umount. With the suggested CONFIGURATION changes by Konrad and Jeremy, I am unable to reproduce this same BUG for testcrash.sh loop 1000 for two of my test servers. The following are the summary/short note:> - turn on CONFIG_DEBUG_PAGEALLOCOk, set.> - turn on CONFIG_DEBUG_LISTAlready set originally.> - turn on CONFIG_DEBUG_KMEMLEAKDon''t think I can enable this with x86_64 as there isn''t an option for it in x86_64 arch. However, I can see this option in x86_32 arch so I guess it is dependent in x86_32. Anyway, I don''t think this is important for my case... ... why... read on... ... :P> - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUGOk, set.> - turn on CONFIG_SLUB_DEBUG_ONOk, set as I need to change from CONFIG_SLAB to CONFIG_SLUB instead which also set CONFIG_SLUB_DEBUG=y besides CONFIG_SLUB_DEBUG_ON=y. So from the testcrash results for two of my servers, I know there must be related to the kernel CONFIGURATION changes and one of them is the cause to prevent hitting this BUG. Now I am testing to set one of the mentioned CONFIG at a time then run the same testcrash again to determine which is the only CONFIG option that will not trigger this same BUG. The results as below all using my old config as base with *only one CONFIG option change at a time* to run testcrash loop 100: With CONFIG_DEBUG_PAGEALLOC=y: Result : Think should be this one to prevent hitting this same BUG as one of my test server already past testcrash loop cycle 100... ... now testing testcrash loop 10000 :P With CONFIG_SLUB=y and CONFIG_SLUB_DEBUG=y: Result : CRASH With CONFIG_SLUB=y, CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y: Result : CRASH With CONFIG_JBD_DEBUG=y: Result : CRASH With CONFIG_JBD2_DEBUG=y: Result : CRASH Can others who hit this same BUG reconfirm that your kernel config is without CONFIG_DEBUG_PAGEALLOC being set/on? I think most production servers will not have this config option enable in default. If so, can test with CONFIG_DEBUG_PAGEALLOC=y instead? Sorry, currently still in testing phrase for such configuration and hopefully can pass this testcrash with loop 10000 for one of my server (am I crazy? LOL). If this is really the case (I hope) then I guess there must be some conditional difference for CONFIG_PAGEALLOC as without CONFIG_DEBUG_PAGEALLOC set it will hit this BUG but with it set to on it won''t (at least during my composing of this mail reply/report)... ... I will report back my final testcrash loop 10000 result when finish... ... keeping fingers crossed!!! Can anyone test with kernel version 2.6.38 PVOPS tree with CONFIG_DEBUG_PAGEALLOC not set and set to see whether such BUG exists in 2.6.38? I hope this report is useful especially to Konrad and Jeremy... ... ;) Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Mar-28 11:37 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
> - turn on CONFIG_DEBUG_PAGEALLOC > - turn on CONFIG_DEBUG_LIST > - turn on CONFIG_DEBUG_KMEMLEAK > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG > - turn on CONFIG_SLUB_DEBUG_ONAfter i enabled those options (i dont use SLUB, i use SLAB) i do no longer encounter any errors. I completed 1000 loops of snapshot/mount/umoun/removesnapshot. Without those options in 2.6.32.35 i hit a different bug earlier today: But you really have to be patient to see some output, because lvremove will hang quite a while: (a "while" beeing the a a roughly the time it takes for: wait 5 min for error, leave office, get coffee, smoke cigarette, goto restroom, return to office, finally see error) kernel: BUG: unable to handle kernel paging request ... kernel: RIP [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0 syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at: http://pastebin.com/Ad8MhUzD After that happened i did a kernel recompile without rebooting the machine first and encoundeterd system_call_fastpath as last call once more as shown in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mp Maybe this helps, but i think, if anything, this makes it worse as the debug options actually supressed the problem that needs to be debugged. with best regards -- Andreas "andiolsi "Olsowski _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Mar-28 12:29 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Mon, Mar 28, 2011 at 7:37 PM, Andreas Olsowski <andreas.olsowski@leuphana.de> wrote:> >> - turn on CONFIG_DEBUG_PAGEALLOC >> - turn on CONFIG_DEBUG_LIST >> - turn on CONFIG_DEBUG_KMEMLEAK >> - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG >> - turn on CONFIG_SLUB_DEBUG_ON > > After i enabled those options (i dont use SLUB, i use SLAB) i do no longer > encounter any errors. > > I completed 1000 loops of snapshot/mount/umoun/removesnapshot.Did you try with just CONFIG_DEBUG_PAGEALLOC=y and leave the rest unchange of your config? My testing all narrow down to CONFIG_DEBUG_PAGEALLOC=y to prevent this BUG.> > > Without those options in 2.6.32.35 i hit a different bug earlier today: > > But you really have to be patient to see some output, because lvremove will > hang quite a while: > (a "while" beeing the a a roughly the time it takes for: wait 5 min for > error, leave office, get coffee, smoke cigarette, goto restroom, return to > office, finally see error) > > kernel: BUG: unable to handle kernel paging request > ... > kernel: RIP [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0 > syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at: > http://pastebin.com/Ad8MhUzDI hit this before: # grep ''xen_set_pmd'' /var/log/messages* /var/log/messages:Mar 27 09:31:14 xen05 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 09:06:10 xen05 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 15:18:57 xen05 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: IP: [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: RIP: e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b /var/log/messages.1:Mar 23 11:00:17 xen05 kernel: RIP [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b But unable to reproduce when CONFIG_DEBUG_PAGEALLOC=y.> > After that happened i did a kernel recompile without rebooting the machine > first and encoundeterd system_call_fastpath as last call once more as shown > in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mpI hit this at least once but unable to when CONFIG_DEBUG_PAGEALLOC=y: /var/log/messages-Mar 27 17:04:39 xen05 kernel: ------------[ cut here ]------------ /var/log/messages-Mar 27 17:04:39 xen05 kernel: kernel BUG at arch/x86/xen/mmu.c:1872! /var/log/messages-Mar 27 17:04:39 xen05 kernel: invalid opcode: 0000 [#1] SMP /var/log/messages-Mar 27 17:04:39 xen05 kernel: last sysfs file: /sys/block/sdd/dev /var/log/messages-Mar 27 17:04:39 xen05 kernel: CPU 2 /var/log/messages-Mar 27 17:04:39 xen05 kernel: Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video backlight output sbs sbshc power_meter hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp parport tg3 libphy sg ide_cd_mod cdrom serio_raw button tpm_tis tpm tpm_bios i2c_i801 i2c_core shpchp iTCO_wdt pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] /var/log/messages-Mar 27 17:04:39 xen05 kernel: Pid: 5874, comm: lvcreate Not tainted 2.6.32.35-4.xen.pvops.choon.centos5 #1 PowerEdge 860 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP: e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP: e02b:ffff8800303d1c28 EFLAGS: 00010282 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RAX: 00000000ffffffea RBX: 000000000003032d RCX: 0000000000000181 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RDX: 00000000deadbeef RSI: 00000000deadbeef RDI: 00000000deadbeef /var/log/messages-Mar 27 17:04:39 xen05 kernel: RBP: ffff8800303d1c48 R08: 0000000000000968 R09: ffff880000000000 /var/log/messages-Mar 27 17:04:39 xen05 kernel: R10: 00000000deadbeef R11: ffff8800303d1d08 R12: 0000000000000003 /var/log/messages-Mar 27 17:04:39 xen05 kernel: R13: 000000000003032d R14: ffff880030360000 R15: 00007fd324a00000 /var/log/messages-Mar 27 17:04:39 xen05 kernel: FS: 00007fd327d2e710(0000) GS:ffff880028089000(0000) knlGS:0000000000000000 /var/log/messages-Mar 27 17:04:39 xen05 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b /var/log/messages-Mar 27 17:04:39 xen05 kernel: CR2: 00000000004612f0 CR3: 000000003a025000 CR4: 0000000000002660 /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 /var/log/messages-Mar 27 17:04:39 xen05 kernel: Process lvcreate (pid: 5874, threadinfo ffff8800303d0000, task ffff880030360000) /var/log/messages-Mar 27 17:04:39 xen05 kernel: Stack: /var/log/messages-Mar 27 17:04:39 xen05 kernel: 0000000000000000 00000000002027a9 000000013eb43318 000000000003032d /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c68 ffffffff8100e07c ffff880032be05c0 ffff880032aa9928 /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c78 ffffffff8100e0af ffff8800303d1cb8 ffffffff810a4433 /var/log/messages-Mar 27 17:04:39 xen05 kernel: Call Trace: /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8100e07c>] xen_alloc_ptpage+0x64/0x69 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8100e0af>] xen_alloc_pte+0xe/0x10 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a4433>] __pte_alloc+0x70/0xce /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a45d1>] handle_mm_fault+0x140/0x8b9 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a50c9>] __get_user_pages+0x37f/0x479 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a76ca>] __mlock_vma_pages_range+0xc0/0x16f /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8131c03f>] ? _spin_unlock_irqrestore+0x11/0x13 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a78db>] mlock_fixup+0x162/0x199 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a7989>] do_mlockall+0x77/0x8d /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff81139016>] ? security_capable+0x27/0x29 /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a7ce2>] sys_mlockall+0x8f/0xb9 /var/log/messages:Mar 27 17:04:39 xen05 kernel: [<ffffffff81012ac2>] system_call_fastpath+0x16/0x1b /var/log/messages-Mar 27 17:04:39 xen05 kernel: Code: 48 b8 ff ff ff ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP <ffff8800303d1c28> /var/log/messages-Mar 27 17:04:39 xen05 kernel: ---[ end trace bf36c55d2ecd52e5 ]---> > > Maybe this helps, but i think, if anything, this makes it worse as the debug > options actually supressed the problem that needs to be debugged.True. At least now we know/narrow down to just related to CONFIG_DEBUG_PAGEALLOC. Maybe Konrad or Jeremy can have a closer look in the related codes... ... Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave Hunter
2011-Apr-05 22:01 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
Hi guys, This thread has gone quiet for a while and I was wondering if a solution had been found? I''m currently running the packaged version of Xen 4.0.1 in Debian Squeeze and everything runs well, except for the random crashing when using LVM. I use LVM for the disk partitions, and use live snapshots as part of our backup routine. That is, create snapshot -> mount snapshot -> rsync -> umount snapshot -> remove snapshot. Cheers, Dave Hunter. On Mon, 2011-03-28 at 20:29 +0800, Teck Choon Giam wrote:> On Mon, Mar 28, 2011 at 7:37 PM, Andreas Olsowski > <andreas.olsowski@leuphana.de> wrote: > > > >> - turn on CONFIG_DEBUG_PAGEALLOC > >> - turn on CONFIG_DEBUG_LIST > >> - turn on CONFIG_DEBUG_KMEMLEAK > >> - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG > >> - turn on CONFIG_SLUB_DEBUG_ON > > > > After i enabled those options (i dont use SLUB, i use SLAB) i do no longer > > encounter any errors. > > > > I completed 1000 loops of snapshot/mount/umoun/removesnapshot. > > Did you try with just CONFIG_DEBUG_PAGEALLOC=y and leave the rest > unchange of your config? My testing all narrow down to > CONFIG_DEBUG_PAGEALLOC=y to prevent this BUG. > > > > > > > Without those options in 2.6.32.35 i hit a different bug earlier today: > > > > But you really have to be patient to see some output, because lvremove will > > hang quite a while: > > (a "while" beeing the a a roughly the time it takes for: wait 5 min for > > error, leave office, get coffee, smoke cigarette, goto restroom, return to > > office, finally see error) > > > > kernel: BUG: unable to handle kernel paging request > > ... > > kernel: RIP [<ffffffff8100f2bf>] xen_set_pmd+0x2f/0xb0 > > syslog/dmesg output is attached as crash.2.6.32.35-xen_01 or available at: > > http://pastebin.com/Ad8MhUzD > > I hit this before: > > # grep ''xen_set_pmd'' /var/log/messages* > /var/log/messages:Mar 27 09:31:14 xen05 kernel: IP: > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP: > e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 09:31:14 xen05 kernel: RIP > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 09:06:10 xen05 kernel: IP: > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP: > e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 09:06:10 xen05 kernel: RIP > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 15:18:57 xen05 kernel: IP: > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP: > e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages:Mar 27 15:18:57 xen05 kernel: RIP > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: IP: > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages.1:Mar 23 11:00:16 xen05 kernel: RIP: > e030:[<ffffffff8100e2d4>] [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > /var/log/messages.1:Mar 23 11:00:17 xen05 kernel: RIP > [<ffffffff8100e2d4>] xen_set_pmd+0x16/0x2b > > But unable to reproduce when CONFIG_DEBUG_PAGEALLOC=y. > > > > > After that happened i did a kernel recompile without rebooting the machine > > first and encoundeterd system_call_fastpath as last call once more as shown > > in crash.2.6.32.35-xen_02 or http://pastebin.com/kB38W5mp > > I hit this at least once but unable to when CONFIG_DEBUG_PAGEALLOC=y: > > /var/log/messages-Mar 27 17:04:39 xen05 kernel: ------------[ cut here > ]------------ > /var/log/messages-Mar 27 17:04:39 xen05 kernel: kernel BUG at > arch/x86/xen/mmu.c:1872! > /var/log/messages-Mar 27 17:04:39 xen05 kernel: invalid opcode: 0000 [#1] SMP > /var/log/messages-Mar 27 17:04:39 xen05 kernel: last sysfs file: > /sys/block/sdd/dev > /var/log/messages-Mar 27 17:04:39 xen05 kernel: CPU 2 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Modules linked in: > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_physdev iptable_filter > ip_tables x_tables bridge stp be2iscsi iscsi_tcp bnx2i cnic uio ipv6 > cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi > dm_multipath scsi_dh video backlight output sbs sbshc power_meter > hwmon battery acpi_memhotplug xen_acpi_memhotplug ac parport_pc lp > parport tg3 libphy sg ide_cd_mod cdrom serio_raw button tpm_tis tpm > tpm_bios i2c_i801 i2c_core shpchp iTCO_wdt pcspkr dm_snapshot dm_zero > dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod > raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Pid: 5874, comm: > lvcreate Not tainted 2.6.32.35-4.xen.pvops.choon.centos5 #1 PowerEdge > 860 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP: > e030:[<ffffffff8100cb5b>] [<ffffffff8100cb5b>] > pin_pagetable_pfn+0x53/0x59 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP: > e02b:ffff8800303d1c28 EFLAGS: 00010282 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RAX: 00000000ffffffea > RBX: 000000000003032d RCX: 0000000000000181 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RDX: 00000000deadbeef > RSI: 00000000deadbeef RDI: 00000000deadbeef > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RBP: ffff8800303d1c48 > R08: 0000000000000968 R09: ffff880000000000 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: R10: 00000000deadbeef > R11: ffff8800303d1d08 R12: 0000000000000003 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: R13: 000000000003032d > R14: ffff880030360000 R15: 00007fd324a00000 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: FS: > 00007fd327d2e710(0000) GS:ffff880028089000(0000) > knlGS:0000000000000000 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: CS: e033 DS: 0000 ES: > 0000 CR0: 000000008005003b > /var/log/messages-Mar 27 17:04:39 xen05 kernel: CR2: 00000000004612f0 > CR3: 000000003a025000 CR4: 0000000000002660 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Process lvcreate (pid: > 5874, threadinfo ffff8800303d0000, task ffff880030360000) > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Stack: > /var/log/messages-Mar 27 17:04:39 xen05 kernel: 0000000000000000 > 00000000002027a9 000000013eb43318 000000000003032d > /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c68 > ffffffff8100e07c ffff880032be05c0 ffff880032aa9928 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: <0> ffff8800303d1c78 > ffffffff8100e0af ffff8800303d1cb8 ffffffff810a4433 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Call Trace: > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8100e07c>] > xen_alloc_ptpage+0x64/0x69 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8100e0af>] > xen_alloc_pte+0xe/0x10 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a4433>] > __pte_alloc+0x70/0xce > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a45d1>] > handle_mm_fault+0x140/0x8b9 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a50c9>] > __get_user_pages+0x37f/0x479 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a76ca>] > __mlock_vma_pages_range+0xc0/0x16f > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff8131c03f>] > ? _spin_unlock_irqrestore+0x11/0x13 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a78db>] > mlock_fixup+0x162/0x199 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a7989>] > do_mlockall+0x77/0x8d > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff81139016>] > ? security_capable+0x27/0x29 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: [<ffffffff810a7ce2>] > sys_mlockall+0x8f/0xb9 > /var/log/messages:Mar 27 17:04:39 xen05 kernel: [<ffffffff81012ac2>] > system_call_fastpath+0x16/0x1b > /var/log/messages-Mar 27 17:04:39 xen05 kernel: Code: 48 b8 ff ff ff > ff ff ff ff 7f 48 21 c2 48 89 55 e8 48 8d 7d e0 be 01 00 00 00 31 d2 > 41 ba f0 7f 00 00 e8 e9 c7 ff ff 85 c0 74 04 <0f> 0b eb fe c9 c3 55 40 > f6 c7 01 48 89 e5 53 48 89 fb 74 5b 48 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RIP > [<ffffffff8100cb5b>] pin_pagetable_pfn+0x53/0x59 > /var/log/messages-Mar 27 17:04:39 xen05 kernel: RSP <ffff8800303d1c28> > /var/log/messages-Mar 27 17:04:39 xen05 kernel: ---[ end trace > bf36c55d2ecd52e5 ]--- > > > > > > > Maybe this helps, but i think, if anything, this makes it worse as the debug > > options actually supressed the problem that needs to be debugged. > > True. At least now we know/narrow down to just related to > CONFIG_DEBUG_PAGEALLOC. Maybe Konrad or Jeremy can have a closer look > in the related codes... ... > > Thanks. > > Kindest regards, > Giam Teck Choon > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Apr-05 22:15 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Wed, Apr 6, 2011 at 6:01 AM, Dave Hunter <dave@ivt.com.au> wrote:> Hi guys, > > This thread has gone quiet for a while and I was wondering if a solution > had been found? > > I''m currently running the packaged version of Xen 4.0.1 in Debian > Squeeze and everything runs well, except for the random crashing when > using LVM. > > I use LVM for the disk partitions, and use live snapshots as part of our > backup routine. That is, create snapshot -> mount snapshot -> rsync -> > umount snapshot -> remove snapshot. > > Cheers, > > Dave Hunter.Solution is to compile your PVOPS 2.6.32.x kernel with CONFIG_DEBUG_PAGEALLOC=y. With CONFIG_DEBUG_PAGEALLOC=y, it pass my testcrash loop 10000. Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave Hunter
2011-Apr-05 23:20 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
Thanks! Is it likely that Debian would release an updated kernel in squeeze with this configuration? (sorry, this might not be the place to ask). Dave. On Wed, 2011-04-06 at 06:15 +0800, Teck Choon Giam wrote:> On Wed, Apr 6, 2011 at 6:01 AM, Dave Hunter <dave@ivt.com.au> wrote: > > Hi guys, > > > > This thread has gone quiet for a while and I was wondering if a solution > > had been found? > > > > I''m currently running the packaged version of Xen 4.0.1 in Debian > > Squeeze and everything runs well, except for the random crashing when > > using LVM. > > > > I use LVM for the disk partitions, and use live snapshots as part of our > > backup routine. That is, create snapshot -> mount snapshot -> rsync -> > > umount snapshot -> remove snapshot. > > > > Cheers, > > > > Dave Hunter. > > Solution is to compile your PVOPS 2.6.32.x kernel with CONFIG_DEBUG_PAGEALLOC=y. > > With CONFIG_DEBUG_PAGEALLOC=y, it pass my testcrash loop 10000. > > Thanks. > > Kindest regards, > Giam Teck Choon_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Apr-06 07:53 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
Please don''t top post. On Wed, 2011-04-06 at 00:20 +0100, Dave Hunter wrote:> Is it likely that Debian would release an updated kernel in squeeze with > this configuration? (sorry, this might not be the place to ask).I doubt they will, enabling DEBUG_PAGEALLOC seems very much like a workaround not a solution to me. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Apr-06 21:52 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On 04/06/2011 12:53 AM, Ian Campbell wrote:> Please don''t top post. > > On Wed, 2011-04-06 at 00:20 +0100, Dave Hunter wrote: >> Is it likely that Debian would release an updated kernel in squeeze with >> this configuration? (sorry, this might not be the place to ask). > I doubt they will, enabling DEBUG_PAGEALLOC seems very much like a > workaround not a solution to me.Yes, it will impose a pretty large performance overhead. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Teck Choon Giam
2011-Apr-07 13:16 UTC
Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Thu, Apr 7, 2011 at 5:52 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 04/06/2011 12:53 AM, Ian Campbell wrote: >> Please don''t top post. >> >> On Wed, 2011-04-06 at 00:20 +0100, Dave Hunter wrote: >>> Is it likely that Debian would release an updated kernel in squeeze with >>> this configuration? (sorry, this might not be the place to ask). >> I doubt they will, enabling DEBUG_PAGEALLOC seems very much like a >> workaround not a solution to me. > > Yes, it will impose a pretty large performance overhead. > > JI understand that option will impose a pretty large performance overhead but it is currently a working trade-off/solution/workaround ;) It is better to have that trade-off than having an unstable host server for those of us who need LVM snapshots related... IMO. Hopefully you or Konrad can have a look into the related codes and if need some testers... I always will try my best to test any new patches if available to test its stability in this... ... Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel