Kip Macy
2005-Jul-06 18:08 UTC
[Xen-devel] grant table unmap failure makes guest unreapable and causes xen oops
I just hit this so I don''t fully understand it yet, but it looks like there may be some race condition with grant_table unmap requests and garbage collection of domain memory on crashed guests. My centos4 domU isn''t finding its init (this may be the breakage in file-backed VBDs that Mark mentioned - it was finding it a couple of days ago) and thus calls HYPERVISOR_crash: Freeing unused kernel memory: 92k freed Kernel panic - not syncing: No init found. Try passing init= option to kernel. [root@rs0 ~]# xm list Name Id Mem(MB) CPU VCPU(s) State Time(s) Console Domain-0 0 251 0 1 r---- 1013.3 rhel4_0 1 0 3 1 ----c 0.5 9601 The following errors show up on the console: (XEN) (file=grant_table.c, line=500) Bad handle (0). (XEN) (file=grant_table.c, line=500) Bad handle (49152). (XEN) (file=grant_table.c, line=500) Bad handle (49792). (XEN) (file=grant_table.c, line=500) Bad handle (0). (XEN) (file=grant_table.c, line=500) Bad handle (61440). And the guest never goes away. [root@rs0 ~]# xm destroy 1 [root@rs0 ~]# xm list Name Id Mem(MB) CPU VCPU(s) State Time(s) Console Domain-0 0 251 0 1 r---- 1208.2 rhel4_0 1 0 3 1 ----c 0.5 9601 restarting xend here is interesting: [root@rs0 ~]# xend start DBMap>introduceDomain> 1 69067 <EventChannel dom1:0:14 dom2:1:2> /domain/4042ebcc-778d-4488-a0bd-6152c42ba98b Traceback (most recent call last): <snip> RuntimeError: (9, ''Bad file descriptor'') Message from syslogd@rs0 at Wed Jul 6 10:59:17 2005 ... rs0 xenstored: xenstored corruption: connection id 0: err Bad address: Unknown error 14 (Bad address) Exception starting xend: (9, ''Bad file descriptor'') On the console we see: (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h, line=187) Error pfn be9: rd=ffbf8a80, od=ffbf8a80, caf=00000000, taf=f0000001 (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h, line=187) Error pfn 10dcb: rd=ffbf8a80, od=00000000, caf=00000000, taf=f0000000 [ERR] corruptxenstored corruption: connection id 0: err Bad address: Unknown error 14 (Bad address) *NOW* comes the fun part: [root@rs0 ~]# /sbin/shutdown -r now Broadcast message from root (pts/1) (Wed Jul 6 11:01:00 2005): The system is going down for reboot NOW! INIT: Sending processes the TERM signal (XEN) CPU: 0 (XEN) EIP: e008:[<ff10b882>] (XEN) EFLAGS: 00210202 CONTEXT: hypervisor (XEN) eax: 0000000a ebx: 00000000 ecx: 00000000 edx: 00000003 (XEN) esi: 00000001 edi: ffbf2700 ebp: ffbf1004 esp: ff103e04 (XEN) cr0: 8005003b cr3: 181cd000 (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff103e04: (XEN) 00000001 00000052 00000000 00000400 fec6b000 ff1a9900 fc400000 00000f00 (XEN) 00000000 00000000 00000001 000a0067 fc400f00 ff1a9900 ff1a7080 [ff1269de] (XEN) ff1a7080 ff1a9900 000000a0 00000000 fec6b000 32ff0001 ff1a9900 0000067c (XEN) 00000261 000a0067 fec72000 [ff12a110] 000a0067 ff1a9900 00000000 [ff13b9ef] (XEN) 181cd000 ff103fb4 00000001 c7910985 ff1a9b80 fec72000 ff1a9900 [ff12a266] (XEN) ff1a9900 fec72000 ff1b2000 [ff12ba14] fec71000 ff1a9b80 ff103fb4 ff1a9b90 (XEN) c7910985 fe31e440 00000000 00000000 00000000 00000000 ff1a9900 [ff12d55e] (XEN) ff1a9900 00000000 0000000c 00200286 ff103fb4 ff1a9900 [ff13b9ef] 181cd000 (XEN) 001446c9 00000000 c7910984 ff1a9900 00018ef0 18ef0061 [ff12eb6a] 00018ef0 (XEN) ffffffff 00000010 ff1a9900 00000007 c873e000 00010000 c0568ee0 000002db (XEN) 32db0001 ff103fb4 ff1a9b80 ff1a9900 00000000 fe3f8b6c ff1a9900 ff103fb4 (XEN) c7910984 ffbf3080 [ff13e867] 00000000 00000000 00000000 00000000 00000001 (XEN) 00000005 00000020 ee000000 ffbf3080 ffbf3bf8 ffbf3080 ffbf3080 ffbf3080 (XEN) 00007ff0 c8623284 b6e69000 [ff14a8f3] c8683eec 00000001 00000000 00007ff0 (XEN) c8623284 b6e69000 0000001a 000e0003 c0115b33 00000061 00200282 c8683eec (XEN) 00000069 0000007b 0000007b 00000000 00000000 00000000 ffbf3080 (XEN) Xen call trace from esp=ff103e04: (XEN) [<ff1269de>] [<ff12a110>] [<ff13b9ef>] [<ff12a266>] [<ff12ba14>] [<ff12d55e>] (XEN) [<ff13b9ef>] [<ff12eb6a>] [<ff13e867>] [<ff14a8f3>] **************************************** Panic on CPU0: CPU0 FATAL PAGE FAULT [error_code=0000] Faulting linear address: 00000004 **************************************** Reboot in five seconds... Line 910 of "grant_table.c" starts at address 0xff10b87f <gnttab_check_unmap+175> and ends at 0xff10b888 <gnttab_check_unmap+184>. <...> ( readonly ? 1 : (!(map->ref_and_flags & GNTMAP_readonly)))) { ref = (map->ref_and_flags >> MAPTRACK_REF_SHIFT); act = &rgt->active[ref]; <- line 910 spin_lock(&rgt->lock); if ( act->frame != frame ) <...> 0xff10b882 <gnttab_check_unmap+178>: mov 0x4(%ecx),%eax 0xff1269de <put_page_from_l1e+270>: test %eax,%eax 0xff12a110 <revalidate_l1+176>: jmp 0xff12a090 <revalidate_l1+48> 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax 0xff12a266 <ptwr_flush+246>: mov %edi,(%esp) 0xff12d55e <do_mmuext_op+1150>: jmp 0xff12d17a <do_mmuext_op+154> 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax 0xff12eb6a <ptwr_do_page_fault+506>: mov %eax,%esi 0xff13e867 <do_page_fault+423>: test %eax,%eax 0xff14a8f3 <hypercall+83>: mov %eax,0x18(%esp) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kip Macy
2005-Jul-06 18:18 UTC
[Xen-devel] Re: grant table unmap failure makes guest unreapable and causes xen oops
It looks like some of these problems may have been fixed by check-ins in the last couple of hours. I''m doing a make world right now. -Kip On 7/6/05, Kip Macy <kip.macy@gmail.com> wrote:> I just hit this so I don''t fully understand it yet, but it looks like > there may be some race condition with grant_table unmap requests and > garbage collection of domain memory on crashed guests. > > My centos4 domU isn''t finding its init (this may be the breakage in > file-backed VBDs that Mark mentioned - it was finding it a couple of > days ago) and thus calls HYPERVISOR_crash: > > Freeing unused kernel memory: 92k freed > Kernel panic - not syncing: No init found. Try passing init= option to kernel. > > [root@rs0 ~]# xm list > Name Id Mem(MB) CPU VCPU(s) State Time(s) Console > Domain-0 0 251 0 1 r---- 1013.3 > rhel4_0 1 0 3 1 ----c > 0.5 9601 > > The following errors show up on the console: > > (XEN) (file=grant_table.c, line=500) Bad handle (0). > (XEN) (file=grant_table.c, line=500) Bad handle (49152). > (XEN) (file=grant_table.c, line=500) Bad handle (49792). > (XEN) (file=grant_table.c, line=500) Bad handle (0). > (XEN) (file=grant_table.c, line=500) Bad handle (61440). > > And the guest never goes away. > > [root@rs0 ~]# xm destroy 1 > [root@rs0 ~]# xm list > Name Id Mem(MB) CPU VCPU(s) State Time(s) Console > Domain-0 0 251 0 1 r---- 1208.2 > rhel4_0 1 0 3 1 > ----c 0.5 9601 > > restarting xend here is interesting: > > [root@rs0 ~]# xend start > DBMap>introduceDomain> 1 69067 <EventChannel dom1:0:14 dom2:1:2> > /domain/4042ebcc-778d-4488-a0bd-6152c42ba98b > Traceback (most recent call last): > <snip> > RuntimeError: (9, ''Bad file descriptor'') > > Message from syslogd@rs0 at Wed Jul 6 10:59:17 2005 ... > rs0 xenstored: xenstored corruption: connection id 0: err Bad address: > Unknown error 14 (Bad address) > Exception starting xend: (9, ''Bad file descriptor'') > > On the console we see: > (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h, > line=187) Error pfn be9: rd=ffbf8a80, od=ffbf8a80, caf=00000000, > taf=f0000001 > (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h, > line=187) Error pfn 10dcb: rd=ffbf8a80, od=00000000, caf=00000000, > taf=f0000000 > [ERR] corruptxenstored corruption: connection id 0: err Bad address: > Unknown error 14 (Bad address) > > *NOW* comes the fun part: > [root@rs0 ~]# /sbin/shutdown -r now > > Broadcast message from root (pts/1) (Wed Jul 6 11:01:00 2005): > > The system is going down for reboot NOW! > INIT: Sending processes the TERM signal > (XEN) CPU: 0 > (XEN) EIP: e008:[<ff10b882>] > (XEN) EFLAGS: 00210202 CONTEXT: hypervisor > (XEN) eax: 0000000a ebx: 00000000 ecx: 00000000 edx: 00000003 > (XEN) esi: 00000001 edi: ffbf2700 ebp: ffbf1004 esp: ff103e04 > (XEN) cr0: 8005003b cr3: 181cd000 > (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from esp=ff103e04: > (XEN) 00000001 00000052 00000000 00000400 fec6b000 ff1a9900 fc400000 00000f00 > (XEN) 00000000 00000000 00000001 000a0067 fc400f00 ff1a9900 > ff1a7080 [ff1269de] > (XEN) ff1a7080 ff1a9900 000000a0 00000000 fec6b000 32ff0001 ff1a9900 0000067c > (XEN) 00000261 000a0067 fec72000 [ff12a110] 000a0067 ff1a9900 > 00000000 [ff13b9ef] > (XEN) 181cd000 ff103fb4 00000001 c7910985 ff1a9b80 fec72000 > ff1a9900 [ff12a266] > (XEN) ff1a9900 fec72000 ff1b2000 [ff12ba14] fec71000 ff1a9b80 > ff103fb4 ff1a9b90 > (XEN) c7910985 fe31e440 00000000 00000000 00000000 00000000 > ff1a9900 [ff12d55e] > (XEN) ff1a9900 00000000 0000000c 00200286 ff103fb4 ff1a9900 > [ff13b9ef] 181cd000 > (XEN) 001446c9 00000000 c7910984 ff1a9900 00018ef0 18ef0061 > [ff12eb6a] 00018ef0 > (XEN) ffffffff 00000010 ff1a9900 00000007 c873e000 00010000 c0568ee0 000002db > (XEN) 32db0001 ff103fb4 ff1a9b80 ff1a9900 00000000 fe3f8b6c ff1a9900 ff103fb4 > (XEN) c7910984 ffbf3080 [ff13e867] 00000000 00000000 00000000 > 00000000 00000001 > (XEN) 00000005 00000020 ee000000 ffbf3080 ffbf3bf8 ffbf3080 ffbf3080 ffbf3080 > (XEN) 00007ff0 c8623284 b6e69000 [ff14a8f3] c8683eec 00000001 > 00000000 00007ff0 > (XEN) c8623284 b6e69000 0000001a 000e0003 c0115b33 00000061 00200282 c8683eec > (XEN) 00000069 0000007b 0000007b 00000000 00000000 00000000 ffbf3080 > (XEN) Xen call trace from esp=ff103e04: > (XEN) [<ff1269de>] [<ff12a110>] [<ff13b9ef>] [<ff12a266>] > [<ff12ba14>] [<ff12d55e>] > (XEN) [<ff13b9ef>] [<ff12eb6a>] [<ff13e867>] [<ff14a8f3>] > > **************************************** > Panic on CPU0: > CPU0 FATAL PAGE FAULT > [error_code=0000] > Faulting linear address: 00000004 > **************************************** > > Reboot in five seconds... > > > Line 910 of "grant_table.c" starts at address 0xff10b87f > <gnttab_check_unmap+175> > and ends at 0xff10b888 <gnttab_check_unmap+184>. > <...> > ( readonly ? 1 : (!(map->ref_and_flags & GNTMAP_readonly)))) > { > ref = (map->ref_and_flags >> MAPTRACK_REF_SHIFT); > act = &rgt->active[ref]; <- line 910 > > spin_lock(&rgt->lock); > > if ( act->frame != frame ) > <...> > 0xff10b882 <gnttab_check_unmap+178>: mov 0x4(%ecx),%eax > 0xff1269de <put_page_from_l1e+270>: test %eax,%eax > 0xff12a110 <revalidate_l1+176>: jmp 0xff12a090 <revalidate_l1+48> > 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax > 0xff12a266 <ptwr_flush+246>: mov %edi,(%esp) > 0xff12d55e <do_mmuext_op+1150>: jmp 0xff12d17a <do_mmuext_op+154> > 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax > 0xff12eb6a <ptwr_do_page_fault+506>: mov %eax,%esi > 0xff13e867 <do_page_fault+423>: test %eax,%eax > 0xff14a8f3 <hypercall+83>: mov %eax,0x18(%esp) >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel