Kip Macy
2005-Jul-06 18:08 UTC
[Xen-devel] grant table unmap failure makes guest unreapable and causes xen oops
I just hit this so I don''t fully understand it yet, but it looks like
there may be some race condition with grant_table unmap requests and
garbage collection of domain memory on crashed guests.
My centos4 domU isn''t finding its init (this may be the breakage in
file-backed VBDs that Mark mentioned - it was finding it a couple of
days ago) and thus calls HYPERVISOR_crash:
Freeing unused kernel memory: 92k freed
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
[root@rs0 ~]# xm list
Name Id Mem(MB) CPU VCPU(s) State Time(s) Console
Domain-0 0 251 0 1 r---- 1013.3
rhel4_0 1 0 3 1 ----c
0.5 9601
The following errors show up on the console:
(XEN) (file=grant_table.c, line=500) Bad handle (0).
(XEN) (file=grant_table.c, line=500) Bad handle (49152).
(XEN) (file=grant_table.c, line=500) Bad handle (49792).
(XEN) (file=grant_table.c, line=500) Bad handle (0).
(XEN) (file=grant_table.c, line=500) Bad handle (61440).
And the guest never goes away.
[root@rs0 ~]# xm destroy 1
[root@rs0 ~]# xm list
Name Id Mem(MB) CPU VCPU(s) State Time(s) Console
Domain-0 0 251 0 1 r---- 1208.2
rhel4_0 1 0 3 1
----c 0.5 9601
restarting xend here is interesting:
[root@rs0 ~]# xend start
DBMap>introduceDomain> 1 69067 <EventChannel dom1:0:14 dom2:1:2>
/domain/4042ebcc-778d-4488-a0bd-6152c42ba98b
Traceback (most recent call last):
<snip>
RuntimeError: (9, ''Bad file descriptor'')
Message from syslogd@rs0 at Wed Jul 6 10:59:17 2005 ...
rs0 xenstored: xenstored corruption: connection id 0: err Bad address:
Unknown error 14 (Bad address)
Exception starting xend: (9, ''Bad file descriptor'')
On the console we see:
(XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h,
line=187) Error pfn be9: rd=ffbf8a80, od=ffbf8a80, caf=00000000,
taf=f0000001
(XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h,
line=187) Error pfn 10dcb: rd=ffbf8a80, od=00000000, caf=00000000,
taf=f0000000
[ERR] corruptxenstored corruption: connection id 0: err Bad address:
Unknown error 14 (Bad address)
*NOW* comes the fun part:
[root@rs0 ~]# /sbin/shutdown -r now
Broadcast message from root (pts/1) (Wed Jul 6 11:01:00 2005):
The system is going down for reboot NOW!
INIT: Sending processes the TERM signal
(XEN) CPU: 0
(XEN) EIP: e008:[<ff10b882>]
(XEN) EFLAGS: 00210202 CONTEXT: hypervisor
(XEN) eax: 0000000a ebx: 00000000 ecx: 00000000 edx: 00000003
(XEN) esi: 00000001 edi: ffbf2700 ebp: ffbf1004 esp: ff103e04
(XEN) cr0: 8005003b cr3: 181cd000
(XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from esp=ff103e04:
(XEN) 00000001 00000052 00000000 00000400 fec6b000 ff1a9900 fc400000 00000f00
(XEN) 00000000 00000000 00000001 000a0067 fc400f00 ff1a9900
ff1a7080 [ff1269de]
(XEN) ff1a7080 ff1a9900 000000a0 00000000 fec6b000 32ff0001 ff1a9900 0000067c
(XEN) 00000261 000a0067 fec72000 [ff12a110] 000a0067 ff1a9900
00000000 [ff13b9ef]
(XEN) 181cd000 ff103fb4 00000001 c7910985 ff1a9b80 fec72000
ff1a9900 [ff12a266]
(XEN) ff1a9900 fec72000 ff1b2000 [ff12ba14] fec71000 ff1a9b80
ff103fb4 ff1a9b90
(XEN) c7910985 fe31e440 00000000 00000000 00000000 00000000
ff1a9900 [ff12d55e]
(XEN) ff1a9900 00000000 0000000c 00200286 ff103fb4 ff1a9900
[ff13b9ef] 181cd000
(XEN) 001446c9 00000000 c7910984 ff1a9900 00018ef0 18ef0061
[ff12eb6a] 00018ef0
(XEN) ffffffff 00000010 ff1a9900 00000007 c873e000 00010000 c0568ee0 000002db
(XEN) 32db0001 ff103fb4 ff1a9b80 ff1a9900 00000000 fe3f8b6c ff1a9900 ff103fb4
(XEN) c7910984 ffbf3080 [ff13e867] 00000000 00000000 00000000
00000000 00000001
(XEN) 00000005 00000020 ee000000 ffbf3080 ffbf3bf8 ffbf3080 ffbf3080 ffbf3080
(XEN) 00007ff0 c8623284 b6e69000 [ff14a8f3] c8683eec 00000001
00000000 00007ff0
(XEN) c8623284 b6e69000 0000001a 000e0003 c0115b33 00000061 00200282 c8683eec
(XEN) 00000069 0000007b 0000007b 00000000 00000000 00000000 ffbf3080
(XEN) Xen call trace from esp=ff103e04:
(XEN) [<ff1269de>] [<ff12a110>] [<ff13b9ef>]
[<ff12a266>]
[<ff12ba14>] [<ff12d55e>]
(XEN) [<ff13b9ef>] [<ff12eb6a>] [<ff13e867>]
[<ff14a8f3>]
****************************************
Panic on CPU0:
CPU0 FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: 00000004
****************************************
Reboot in five seconds...
Line 910 of "grant_table.c" starts at address 0xff10b87f
<gnttab_check_unmap+175>
and ends at 0xff10b888 <gnttab_check_unmap+184>.
<...>
( readonly ? 1 : (!(map->ref_and_flags & GNTMAP_readonly))))
{
ref = (map->ref_and_flags >> MAPTRACK_REF_SHIFT);
act = &rgt->active[ref]; <- line 910
spin_lock(&rgt->lock);
if ( act->frame != frame )
<...>
0xff10b882 <gnttab_check_unmap+178>: mov 0x4(%ecx),%eax
0xff1269de <put_page_from_l1e+270>: test %eax,%eax
0xff12a110 <revalidate_l1+176>: jmp 0xff12a090 <revalidate_l1+48>
0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax
0xff12a266 <ptwr_flush+246>: mov %edi,(%esp)
0xff12d55e <do_mmuext_op+1150>: jmp 0xff12d17a <do_mmuext_op+154>
0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax
0xff12eb6a <ptwr_do_page_fault+506>: mov %eax,%esi
0xff13e867 <do_page_fault+423>: test %eax,%eax
0xff14a8f3 <hypercall+83>: mov %eax,0x18(%esp)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Kip Macy
2005-Jul-06 18:18 UTC
[Xen-devel] Re: grant table unmap failure makes guest unreapable and causes xen oops
It looks like some of these problems may have been fixed by check-ins
in the last couple of hours. I''m doing a make world right now.
-Kip
On 7/6/05, Kip Macy <kip.macy@gmail.com> wrote:> I just hit this so I don''t fully understand it yet, but it looks
like
> there may be some race condition with grant_table unmap requests and
> garbage collection of domain memory on crashed guests.
>
> My centos4 domU isn''t finding its init (this may be the breakage
in
> file-backed VBDs that Mark mentioned - it was finding it a couple of
> days ago) and thus calls HYPERVISOR_crash:
>
> Freeing unused kernel memory: 92k freed
> Kernel panic - not syncing: No init found. Try passing init= option to
kernel.
>
> [root@rs0 ~]# xm list
> Name Id Mem(MB) CPU VCPU(s) State Time(s) Console
> Domain-0 0 251 0 1 r---- 1013.3
> rhel4_0 1 0 3 1 ----c
> 0.5 9601
>
> The following errors show up on the console:
>
> (XEN) (file=grant_table.c, line=500) Bad handle (0).
> (XEN) (file=grant_table.c, line=500) Bad handle (49152).
> (XEN) (file=grant_table.c, line=500) Bad handle (49792).
> (XEN) (file=grant_table.c, line=500) Bad handle (0).
> (XEN) (file=grant_table.c, line=500) Bad handle (61440).
>
> And the guest never goes away.
>
> [root@rs0 ~]# xm destroy 1
> [root@rs0 ~]# xm list
> Name Id Mem(MB) CPU VCPU(s) State Time(s) Console
> Domain-0 0 251 0 1 r---- 1208.2
> rhel4_0 1 0 3 1
> ----c 0.5 9601
>
> restarting xend here is interesting:
>
> [root@rs0 ~]# xend start
> DBMap>introduceDomain> 1 69067 <EventChannel dom1:0:14
dom2:1:2>
> /domain/4042ebcc-778d-4488-a0bd-6152c42ba98b
> Traceback (most recent call last):
> <snip>
> RuntimeError: (9, ''Bad file descriptor'')
>
> Message from syslogd@rs0 at Wed Jul 6 10:59:17 2005 ...
> rs0 xenstored: xenstored corruption: connection id 0: err Bad address:
> Unknown error 14 (Bad address)
> Exception starting xend: (9, ''Bad file descriptor'')
>
> On the console we see:
> (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h,
> line=187) Error pfn be9: rd=ffbf8a80, od=ffbf8a80, caf=00000000,
> taf=f0000001
> (XEN) (file=/build/kmacy/xen/xen-unstable.hg/xen/include/asm/mm.h,
> line=187) Error pfn 10dcb: rd=ffbf8a80, od=00000000, caf=00000000,
> taf=f0000000
> [ERR] corruptxenstored corruption: connection id 0: err Bad address:
> Unknown error 14 (Bad address)
>
> *NOW* comes the fun part:
> [root@rs0 ~]# /sbin/shutdown -r now
>
> Broadcast message from root (pts/1) (Wed Jul 6 11:01:00 2005):
>
> The system is going down for reboot NOW!
> INIT: Sending processes the TERM signal
> (XEN) CPU: 0
> (XEN) EIP: e008:[<ff10b882>]
> (XEN) EFLAGS: 00210202 CONTEXT: hypervisor
> (XEN) eax: 0000000a ebx: 00000000 ecx: 00000000 edx: 00000003
> (XEN) esi: 00000001 edi: ffbf2700 ebp: ffbf1004 esp: ff103e04
> (XEN) cr0: 8005003b cr3: 181cd000
> (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from esp=ff103e04:
> (XEN) 00000001 00000052 00000000 00000400 fec6b000 ff1a9900 fc400000
00000f00
> (XEN) 00000000 00000000 00000001 000a0067 fc400f00 ff1a9900
> ff1a7080 [ff1269de]
> (XEN) ff1a7080 ff1a9900 000000a0 00000000 fec6b000 32ff0001 ff1a9900
0000067c
> (XEN) 00000261 000a0067 fec72000 [ff12a110] 000a0067 ff1a9900
> 00000000 [ff13b9ef]
> (XEN) 181cd000 ff103fb4 00000001 c7910985 ff1a9b80 fec72000
> ff1a9900 [ff12a266]
> (XEN) ff1a9900 fec72000 ff1b2000 [ff12ba14] fec71000 ff1a9b80
> ff103fb4 ff1a9b90
> (XEN) c7910985 fe31e440 00000000 00000000 00000000 00000000
> ff1a9900 [ff12d55e]
> (XEN) ff1a9900 00000000 0000000c 00200286 ff103fb4 ff1a9900
> [ff13b9ef] 181cd000
> (XEN) 001446c9 00000000 c7910984 ff1a9900 00018ef0 18ef0061
> [ff12eb6a] 00018ef0
> (XEN) ffffffff 00000010 ff1a9900 00000007 c873e000 00010000 c0568ee0
000002db
> (XEN) 32db0001 ff103fb4 ff1a9b80 ff1a9900 00000000 fe3f8b6c ff1a9900
ff103fb4
> (XEN) c7910984 ffbf3080 [ff13e867] 00000000 00000000 00000000
> 00000000 00000001
> (XEN) 00000005 00000020 ee000000 ffbf3080 ffbf3bf8 ffbf3080 ffbf3080
ffbf3080
> (XEN) 00007ff0 c8623284 b6e69000 [ff14a8f3] c8683eec 00000001
> 00000000 00007ff0
> (XEN) c8623284 b6e69000 0000001a 000e0003 c0115b33 00000061 00200282
c8683eec
> (XEN) 00000069 0000007b 0000007b 00000000 00000000 00000000 ffbf3080
> (XEN) Xen call trace from esp=ff103e04:
> (XEN) [<ff1269de>] [<ff12a110>] [<ff13b9ef>]
[<ff12a266>]
> [<ff12ba14>] [<ff12d55e>]
> (XEN) [<ff13b9ef>] [<ff12eb6a>] [<ff13e867>]
[<ff14a8f3>]
>
> ****************************************
> Panic on CPU0:
> CPU0 FATAL PAGE FAULT
> [error_code=0000]
> Faulting linear address: 00000004
> ****************************************
>
> Reboot in five seconds...
>
>
> Line 910 of "grant_table.c" starts at address 0xff10b87f
> <gnttab_check_unmap+175>
> and ends at 0xff10b888 <gnttab_check_unmap+184>.
> <...>
> ( readonly ? 1 : (!(map->ref_and_flags &
GNTMAP_readonly))))
> {
> ref = (map->ref_and_flags >> MAPTRACK_REF_SHIFT);
> act = &rgt->active[ref]; <- line 910
>
> spin_lock(&rgt->lock);
>
> if ( act->frame != frame )
> <...>
> 0xff10b882 <gnttab_check_unmap+178>: mov 0x4(%ecx),%eax
> 0xff1269de <put_page_from_l1e+270>: test %eax,%eax
> 0xff12a110 <revalidate_l1+176>: jmp 0xff12a090
<revalidate_l1+48>
> 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax
> 0xff12a266 <ptwr_flush+246>: mov %edi,(%esp)
> 0xff12d55e <do_mmuext_op+1150>: jmp 0xff12d17a
<do_mmuext_op+154>
> 0xff13b9ef <__flush_tlb_mask+239>: mov 0x44(%ebx),%eax
> 0xff12eb6a <ptwr_do_page_fault+506>: mov %eax,%esi
> 0xff13e867 <do_page_fault+423>: test %eax,%eax
> 0xff14a8f3 <hypercall+83>: mov %eax,0x18(%esp)
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel