When a domain is killed by Xen: DOM0: (file=memory.c, line=333) Page 11c51000 bad type/count (02000000!=01000000) cnt=1 Killing domain 3 Releasing task 3 it would be nice if it notified DOM0, so that DOM0 could map in the domain''s pages and write them out to disk. Is there an existing mechanism in place that I can use for notification? -Kip ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> When a domain is killed by Xen: > DOM0: (file=memory.c, line=333) Page 11c51000 bad type/count (02000000!=01000000) cnt=1 > Killing domain 3 > Releasing task 3 > > it would be nice if it notified DOM0, so that DOM0 could map in the > domain''s pages and write them out to disk. Is there an existing > mechanism in place that I can use for notification?Right now, the only way to do this is rather grim -- see the auto reboot stuff in xc_dom_create. It polls get_domain_info once a second. In the 1.3 tree, if you''ve got the pages mapped into domain 0 they won''t go straight back on the free list when the domain dies (as they''re referenced counted). You can then write out a core dump. Also, you might want to check out Alex''s PDB stuff which is now in the 1.3 tree. It should enable you to insert breakpoints into the other domain. Ian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > Right now, the only way to do this is rather grim -- see the auto > reboot stuff in xc_dom_create. It polls get_domain_info once a > second.Hmm. I guess that could work.> > In the 1.3 tree, if you''ve got the pages mapped into domain 0 > they won''t go straight back on the free list when the domain dies > (as they''re referenced counted). You can then write out a core > dump.I was hoping that I could map them in on demand. I guess there isn''t any good reason why DOM0 shouldn''t have access to everyone''s memory all the time.> > Also, you might want to check out Alex''s PDB stuff which is now > in the 1.3 tree. It should enable you to insert breakpoints into > the other domain.I''ll take a look at that. However, I will need the core dump functionality in the near future. Thanks. -Kip ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > Right now, the only way to do this is rather grim -- see the auto > > reboot stuff in xc_dom_create. It polls get_domain_info once a > > second. > > Hmm. I guess that could work.Using the new inter-domain comms rings mechanism it''ll be easy to add events for things like this.> > In the 1.3 tree, if you''ve got the pages mapped into domain 0 > > they won''t go straight back on the free list when the domain dies > > (as they''re referenced counted). You can then write out a core > > dump. > > I was hoping that I could map them in on demand. I guess there isn''t > any good reason why DOM0 shouldn''t have access to everyone''s memory > all the time.The trouble with mapping them on demand is that as soon as the domain exits the reference count on the pages will go to zero and they''ll end up on the free list, hence may get overwritten e.g. by network packets. Rather than destroying a domain when it faults, its arguable we should just mark it as a zombie, and then rely on user-space domain0 tools to issue a ''destroy'' on the zombies, after writing a coredump if required. This would be an easy hack to add for you purposes. You could create the coredump by modifying the xc_linux_save function. Cheers, Ian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I moved to a machine at work which is hooked up to a portmaster - so I can get the debug output through a serial line. I''m testing out starting up a domain with 12MB and I''ve been seeing the following for several minutes: file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128be: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128bf: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c0: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c1: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c2: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c3: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c4: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c5: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c6: ed=fc664be0,sd=fc1 (file=/u/kmacy/xen/xeno-unstable.bk/xen/include/xeno/mm.h, line=166) Error pfn 000128c7: ed=fc664be0,sd=fc1 Keir has told me that this is harmless. I don''t know if there is any way around it, but while this is going on I''m not able to talk to the machine over the network. -Kip ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Keir has told me that this is harmless. I don''t know if there is any > way around it, but while this is going on I''m not able to talk to the > machine over the network.An easy fix for now will be to modify xen/common/memory.c. Wherever dom0_get_page() is used, you''ll notice that it''s only in the fallback path after failing on a call to get_page(). You can get rid of the domain-builder errors by replacing code like: if ( !get_page() && ((domain != 0) || !dom0_get_page()) ) with: if ( !dom0_get_page() ) I''ll fix the interface at some point so that the caller indicates explicitly which domain it is executing page-table updates for. -- Keir ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-22 23:35 UTC
[Xen-devel] problems with recursively mapping page directory as a page table
The following code: /* install a pde recursively mapping page directory as a page table */ FILLKPT(IdlePTD, PTDPTDI, 1, IdlePTD, L2_PROT_RO); Which basically sets IdlePTD[PTDPTDI] = IdlePTD | LT_PROT_RO appears to be causing the error below. Any thoughts? (file=/u/kmacy/xen/xeno-unstable.bk.home/xen/include/xeno/mm.h,line=243) Unexpected type (saw 40000000 != exp 20000000) for pfDOM0: (file=memory.c, line=339) Bad page type for pfn 0001228d (40000001) -Kip ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Christian Limpach
2004-Feb-23 13:15 UTC
Re: [Xen-devel] problems with recursively mapping page directory as a page table
> The following code: > /* install a pde recursively mapping page directory as a page table */ > FILLKPT(IdlePTD, PTDPTDI, 1, IdlePTD, L2_PROT_RO); > > Which basically sets IdlePTD[PTDPTDI] = IdlePTD | LT_PROT_RO appears to > be causing the error below. Any thoughts? > > (file=/u/kmacy/xen/xeno-unstable.bk.home/xen/include/xeno/mm.h,line=243)Unexpected type (saw 40000000 != exp 20000000) for pfDOM0:> (file=memory.c, line=339) Bad page type for pfn 0001228d (40000001)I think it''s not an error: get_page_from_l2e first tries to validate the PD entry as a regular PD entry (i.e. the page it points to should be an L1 pagetable page, PGT_l1_page_table type) and if this fails, it tries to validate it as a linear pagetable mapping (the page it points to should be an L2 pagetable page, PGT_l2_page_table type). You could add a test around the warnings to check for this condition and then not output the warnings. christian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-23 21:02 UTC
[Xen-devel] refcount errors then crash on XenoLinux with the latest source
I had just tested my domain builder for the nth time on xeno-unstable (very latest source), when I saw the messages below on the console. DOM0 no longer responds to ping - I''m hoping that it will recover, however, in all likelihood I will be hitting the rpb in a few minutes. audit_all_pages zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000 refcount error: pfn=000000 cf=fffffffd refcount=1 audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0 refcount error: pfn=000247 cf=00000001 refcount=0 audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 refcount error: pfn=00024d cf=00000001 refcount=0 audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 refcount error: pfn=00036f cf=40000002 refcount=1 audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 pte_idx=3f9 *pte_idx=0036f063 refcount error: pfn=000371 cf=40000002 refcount=1 audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 pte_idx=3fe *pte_idx=00371063 refcount error: pfn=000372 cf=40000002 refcount=1 audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 pte_idx=3fd *pte_idx=00372063 refcount error: pfn=000390 cf=00000001 refcount=0 audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 refcount error: pfn=000392 cf=00000001 refcount=0 audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 refcount error: pfn=000393 cf=00000001 refcount=0 audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320 refcount error: pfn=000395 cf=00000001 refcount=0 audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320 refcount error: pfn=00039f cf=00000001 refcount=0 audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 refcount error: pfn=0003a1 cf=00000001 refcount=0 audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 refcount error: pfn=0003a2 cf=00000001 refcount=0 audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 refcount error: pfn=0003a8 cf=00000001 refcount=0 audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 refcount error: pfn=0003a9 cf=00000001 refcount=0 audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 refcount error: pfn=0003ab cf=00000001 refcount=0 audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 refcount error: pfn=0003ac cf=00000001 refcount=0 audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 refcount error: pfn=0003ae cf=00000001 refcount=0 audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 refcount error: pfn=0003af cf=00000001 refcount=0 audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340 refcount error: pfn=0003b1 cf=00000001 refcount=0 audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340 refcount error: pfn=0003b2 cf=00000001 refcount=0 audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 refcount error: pfn=0003b4 cf=00000001 refcount=0 audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-23 21:36 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
After a few more minutes the following popped out on the console: CPU: 1 EIP: 0808:[<fc532927>] EFLAGS: 00010206 eax: 0a725012 ebx: 00000010 ecx: fc657560 edx: fc76a460 esi: fc76a460 edi: fc657540 ebp: 00000000 esp: fc64fda0 ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 Stack trace from ESP=fc64fda0: ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a] fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0] fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740 04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670 33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43] 00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0 0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0 00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10 [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d 00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001 fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200 00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004 fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040 00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040 fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c] 00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63 5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172 636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040 **************************************** CPU1 FATAL PAGE FAULT [error_code=00000000] Faulting linear address might be 0a725012 Aieee! CPU1 is toast... **************************************** Is this oops from Xen or from XenoLinux? I downloaded the latest ksymoops and did the following: kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt ksymoops 2.4.9 on i686 2.4.25-xeno. Options used -v ../xenolinux-2.4.25/vmlinux (specified) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.25-xeno/ (default) -m ../xenolinux-2.4.25/System.map (specified) No modules in ksyms, skipping objects Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file? Warning (compare_maps): mismatch on symbol state d, System.map says c0175ca8, vmlinux says 0. Ignoring System.map entry Warning (compare_maps): mismatch on symbol state a, vmlinux says 0, System.map says c0175ca8. Ignoring System.map entry CPU: 1 EIP: 0808:[<fc532927>] Using defaults from ksymoopsSegmentation fault -Kip On Mon, 23 Feb 2004, Kip Macy wrote:> I had just tested my domain builder for the nth time on xeno-unstable > (very latest source), when I saw the messages below on the console. > DOM0 no longer responds to ping - I''m hoping that it will recover, > however, in all likelihood I will be hitting the rpb in a few minutes. > > audit_all_pages > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000 > refcount error: pfn=000000 cf=fffffffd refcount=1 > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0 > > refcount error: pfn=000247 cf=00000001 refcount=0 > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > refcount error: pfn=00024d cf=00000001 refcount=0 > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > refcount error: pfn=00036f cf=40000002 refcount=1 > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > pte_idx=3f9 *pte_idx=0036f063 > > refcount error: pfn=000371 cf=40000002 refcount=1 > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > pte_idx=3fe *pte_idx=00371063 > > refcount error: pfn=000372 cf=40000002 refcount=1 > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > pte_idx=3fd *pte_idx=00372063 > > refcount error: pfn=000390 cf=00000001 refcount=0 > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > refcount error: pfn=000392 cf=00000001 refcount=0 > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > refcount error: pfn=000393 cf=00000001 refcount=0 > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320 > > refcount error: pfn=000395 cf=00000001 refcount=0 > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320 > > refcount error: pfn=00039f cf=00000001 refcount=0 > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > refcount error: pfn=0003a1 cf=00000001 refcount=0 > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > refcount error: pfn=0003a2 cf=00000001 refcount=0 > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > refcount error: pfn=0003a8 cf=00000001 refcount=0 > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > refcount error: pfn=0003a9 cf=00000001 refcount=0 > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > refcount error: pfn=0003ab cf=00000001 refcount=0 > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > refcount error: pfn=0003ac cf=00000001 refcount=0 > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > refcount error: pfn=0003ae cf=00000001 refcount=0 > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > refcount error: pfn=0003af cf=00000001 refcount=0 > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340 > > refcount error: pfn=0003b1 cf=00000001 refcount=0 > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340 > > refcount error: pfn=0003b2 cf=00000001 refcount=0 > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > refcount error: pfn=0003b4 cf=00000001 refcount=0 > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-Feb-23 23:35 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
This is a Xen crash dump. ksymoops won''t help -- you''ll need to map the crash dump to Xen code by hand. It doesn''t take long. The addresses in the stack trace that are enclosed in square brackets are likely to be return addresses in the function-call trace. ''objdump -d xen >xen.s''. Then you can search in xen.s with a text editor to find the call-trace addresses. -- Keir> After a few more minutes the following popped out on the console: > > CPU: 1 > EIP: 0808:[<fc532927>] > EFLAGS: 00010206 > eax: 0a725012 ebx: 00000010 ecx: fc657560 edx: fc76a460 > esi: fc76a460 edi: fc657540 ebp: 00000000 esp: fc64fda0 > ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 > Stack trace from ESP=fc64fda0: > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a] > fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e > fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0] > fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740 > 04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670 > 33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43] > 00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0 > 0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0 > 00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10 > [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d > 00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001 > fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200 > 00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004 > fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040 > 00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040 > fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c] > 00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63 > 5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172 > 636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040 > > **************************************** > CPU1 FATAL PAGE FAULT > [error_code=00000000] > Faulting linear address might be 0a725012 > Aieee! CPU1 is toast... > **************************************** > > Is this oops from Xen or from XenoLinux? I downloaded the latest > ksymoops and did the following: > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt > ksymoops 2.4.9 on i686 2.4.25-xeno. Options used > -v ../xenolinux-2.4.25/vmlinux (specified) > -k /proc/ksyms (default) > -l /proc/modules (default) > -o /lib/modules/2.4.25-xeno/ (default) > -m ../xenolinux-2.4.25/System.map (specified) > > No modules in ksyms, skipping objects > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid > lsmod file? > Warning (compare_maps): mismatch on symbol state d, System.map says > c0175ca8, vmlinux says 0. Ignoring System.map entry > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0, > System.map says c0175ca8. Ignoring System.map entry > CPU: 1 > EIP: 0808:[<fc532927>] > Using defaults from ksymoopsSegmentation fault > > > -Kip > > On Mon, 23 Feb 2004, Kip Macy wrote: > > > I had just tested my domain builder for the nth time on xeno-unstable > > (very latest source), when I saw the messages below on the console. > > DOM0 no longer responds to ping - I''m hoping that it will recover, > > however, in all likelihood I will be hitting the rpb in a few minutes. > > > > audit_all_pages > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000 > > refcount error: pfn=000000 cf=fffffffd refcount=1 > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0 > > > > refcount error: pfn=000247 cf=00000001 refcount=0 > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > refcount error: pfn=00024d cf=00000001 refcount=0 > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > refcount error: pfn=00036f cf=40000002 refcount=1 > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > pte_idx=3f9 *pte_idx=0036f063 > > > > refcount error: pfn=000371 cf=40000002 refcount=1 > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > pte_idx=3fe *pte_idx=00371063 > > > > refcount error: pfn=000372 cf=40000002 refcount=1 > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > pte_idx=3fd *pte_idx=00372063 > > > > refcount error: pfn=000390 cf=00000001 refcount=0 > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > refcount error: pfn=000392 cf=00000001 refcount=0 > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > refcount error: pfn=000393 cf=00000001 refcount=0 > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320 > > > > refcount error: pfn=000395 cf=00000001 refcount=0 > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320 > > > > refcount error: pfn=00039f cf=00000001 refcount=0 > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > refcount error: pfn=0003a1 cf=00000001 refcount=0 > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > refcount error: pfn=0003a2 cf=00000001 refcount=0 > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > refcount error: pfn=0003a8 cf=00000001 refcount=0 > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > refcount error: pfn=0003a9 cf=00000001 refcount=0 > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > refcount error: pfn=0003ab cf=00000001 refcount=0 > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > refcount error: pfn=0003ac cf=00000001 refcount=0 > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > refcount error: pfn=0003ae cf=00000001 refcount=0 > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > refcount error: pfn=0003af cf=00000001 refcount=0 > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340 > > > > refcount error: pfn=0003b1 cf=00000001 refcount=0 > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340 > > > > refcount error: pfn=0003b2 cf=00000001 refcount=0 > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > refcount error: pfn=0003b4 cf=00000001 refcount=0 > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > > > > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-24 01:11 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
> > This is a Xen crash dump. ksymoops won''t help -- you''ll need to map > the crash dump to Xen code by hand. It doesn''t take long. The > addresses in the stack trace that are enclosed in square brackets are > likely to be return addresses in the function-call trace.This is sufficiently tedious that if this happens again I''m going to either run screaming or write a ksymoops for xen.> > ''objdump -d xen >xen.s''. Then you can search in xen.s with a text > editor to find the call-trace addresses.I did this and got what you see below. It looks like to backtraces interleaved. All of the values in brackets are legitimate return addresses (they immediately follow a call instruction). "function addr" is the address of the function itself and "ret addr" is the address taken from the oops. function function addr ret addr ===============================================putchar fc5095be fc5095ef e100_rx_srv fc532048 fc53240a printf fc5095f7 fc509664 putchar_serial fc50927c fc509299 e100intr fc531d8f fc531ef0 handle_IRQ_event fc5b1a25 fc5b1a7d do_IRQ fc5b1bbb fc5b1c43 call_do_IRQ fc5af4bb fc5af4c0 serial_rx_int fc51801d fc518078 serial_rx_int fc51801d fc518046 handle_IRQ_event fc5b1a25 fc5b1a7d reprogram_ac_timer fc5af087 fc5af0aa do_IRQ fc5b1bbb fc5b1c43 ac_timer_softirq_action fc50455c fc50465b call_do_IRQ fc5af4bb fc5af4c0 default_idle fc5b585c fc5b582e continue_cpu_idle_loop fc5b585f fc5b5898 The fault instruction is this: fc532927: 66 83 38 00 cmpw $0x0,(%eax) It is in e100_start_ru. Obviously eax is pointing at some piece of unmapped memory. I''m not sufficiently versed in assembler, particularly optimized, to tell where in we are going wrong: list_for_each(entry_ptr, &(bdp->active_rx_list)) { rx_struct list_entry(entry_ptr, struct rx_list_elem, list_elem); pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr, bdp->rfd_size, PCI_DMA_FROMDEVICE); if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) & __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) { buffer_found = 1; break; } } Could the list have been corrupted? -Kip> > -- Keir > > > After a few more minutes the following popped out on the console: > > > > CPU: 1 > > EIP: 0808:[<fc532927>] > > EFLAGS: 00010206 > > eax: 0a725012 ebx: 00000010 ecx: fc657560 edx: fc76a460 > > esi: fc76a460 edi: fc657540 ebp: 00000000 esp: fc64fda0 > > ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 > > Stack trace from ESP=fc64fda0: > > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a] > > fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e > > fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0] > > fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740 > > 04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670 > > 33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43] > > 00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0 > > 0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0 > > 00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10 > > [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d > > 00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001 > > fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200 > > 00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004 > > fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040 > > 00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040 > > fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c] > > 00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63 > > 5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172 > > 636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040 > > > > **************************************** > > CPU1 FATAL PAGE FAULT > > [error_code=00000000] > > Faulting linear address might be 0a725012 > > Aieee! CPU1 is toast... > > **************************************** > > > > Is this oops from Xen or from XenoLinux? I downloaded the latest > > ksymoops and did the following: > > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt > > ksymoops 2.4.9 on i686 2.4.25-xeno. Options used > > -v ../xenolinux-2.4.25/vmlinux (specified) > > -k /proc/ksyms (default) > > -l /proc/modules (default) > > -o /lib/modules/2.4.25-xeno/ (default) > > -m ../xenolinux-2.4.25/System.map (specified) > > > > No modules in ksyms, skipping objects > > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid > > lsmod file? > > Warning (compare_maps): mismatch on symbol state d, System.map says > > c0175ca8, vmlinux says 0. Ignoring System.map entry > > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0, > > System.map says c0175ca8. Ignoring System.map entry > > CPU: 1 > > EIP: 0808:[<fc532927>] > > Using defaults from ksymoopsSegmentation fault > > > > > > -Kip > > > > On Mon, 23 Feb 2004, Kip Macy wrote: > > > > > I had just tested my domain builder for the nth time on xeno-unstable > > > (very latest source), when I saw the messages below on the console. > > > DOM0 no longer responds to ping - I''m hoping that it will recover, > > > however, in all likelihood I will be hitting the rpb in a few minutes. > > > > > > audit_all_pages > > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000 > > > refcount error: pfn=000000 cf=fffffffd refcount=1 > > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0 > > > > > > refcount error: pfn=000247 cf=00000001 refcount=0 > > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > > > refcount error: pfn=00024d cf=00000001 refcount=0 > > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > > > refcount error: pfn=00036f cf=40000002 refcount=1 > > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > pte_idx=3f9 *pte_idx=0036f063 > > > > > > refcount error: pfn=000371 cf=40000002 refcount=1 > > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > pte_idx=3fe *pte_idx=00371063 > > > > > > refcount error: pfn=000372 cf=40000002 refcount=1 > > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > pte_idx=3fd *pte_idx=00372063 > > > > > > refcount error: pfn=000390 cf=00000001 refcount=0 > > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > > > refcount error: pfn=000392 cf=00000001 refcount=0 > > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > > > refcount error: pfn=000393 cf=00000001 refcount=0 > > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320 > > > > > > refcount error: pfn=000395 cf=00000001 refcount=0 > > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320 > > > > > > refcount error: pfn=00039f cf=00000001 refcount=0 > > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > > > refcount error: pfn=0003a1 cf=00000001 refcount=0 > > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > > > refcount error: pfn=0003a2 cf=00000001 refcount=0 > > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > > > refcount error: pfn=0003a8 cf=00000001 refcount=0 > > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > > > refcount error: pfn=0003a9 cf=00000001 refcount=0 > > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > > > refcount error: pfn=0003ab cf=00000001 refcount=0 > > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > > > refcount error: pfn=0003ac cf=00000001 refcount=0 > > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > > > refcount error: pfn=0003ae cf=00000001 refcount=0 > > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > > > refcount error: pfn=0003af cf=00000001 refcount=0 > > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340 > > > > > > refcount error: pfn=0003b1 cf=00000001 refcount=0 > > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340 > > > > > > refcount error: pfn=0003b2 cf=00000001 refcount=0 > > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > refcount error: pfn=0003b4 cf=00000001 refcount=0 > > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > > Build and deploy apps & Web services for Linux with > > > a free DVD software kit from IBM. Click Now! > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-24 03:44 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
It happened again. Is it possible that Xen isn''t disabling network interrupts while it is "auditing all pages"? -Kip Killing domain 1 Releasing task 1 DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes audit_all_pages refcount error: pfn=000247 cf=00000001 refcount=0 audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 refcount error: pfn=00024d cf=00000001 refcount=0 audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 refcount error: pfn=00036f cf=40000002 refcount=1 audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0 pte_idx=3f9 *pte_idx=0036f063 refcount error: pfn=000371 cf=40000002 refcount=1 audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0 pte_idx=3fe *pte_idx=00371063 refcount error: pfn=000372 cf=40000002 refcount=1 audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0 pte_idx=3fd *pte_idx=00372063 CPU: 1 EIP: 0808:[<fc532ddf>] EFLAGS: 00010206 eax: 06081012 ebx: 00000010 ecx: fc657560 edx: fc650da0 esi: fc650da0 edi: fc657540 ebp: 00000000 esp: fc64fd70 ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 Stack trace from ESP=fc64fd70: ff803012 00000008 [fc51302a] fc76bb40 0000003c fc657540 fc657540 [fc5328c2] fc657540 fc657400 00000017 ffffffff 00000017 fc648040 00000040 0000003e fc657400 fc76bb40 00000040 fc76a740 04000001 fc657540 00005048 [fc5323a8] fc657540 fc5ebc80 fc5d1f1c 00000001 00000046 00000004 [fc509639] fc76a740 04000001 fc64fe60 00000010 [fc5b1f2d] 00000010 fc657400 fc64fe60 3d6e6670 33303030 63203237 00000001 fc76a740 fc600bc0 00000010 fc64fe60 [fc5b20f3] 00000010 fc64fe60 fc76a740 00002207 00372063 34429e8f 00000001 0007fff0 0007fff0 00000002 00000004 [fc5af970] 0007fff0 fd800000 00000001 0007fff0 00000002 00000004 00000040 00010810 00000810 00000810 fc500810 ffffff10 [fc50d0e5] 00000808 00000202 0000004d fc64ff6c fc64ff6c [fc509f22] 0000004d 00000000 fc64ff6c [fc5162b6] 00000003 00000040 fc5ebc80 [fc517c4e] 0000004d fc64ff6c fc64ff6c [fc512529] 00000003 fc6501e0 02000001 [fc517ca4] fc5ebc80 fc64ff6c 0092578a 00000000 fc651200 00000006 00000006 [fc5b1f2d] 00000004 fc5ebc80 fc64ff6c [fc5b151c] 00000004 00000001 00000001 fc6501e0 fc6008c0 00000004 fc64ff6c [fc5b20f3] 00000004 fc64ff6c fc6501e0 [fc511e2e] fc624494 431ea128 00000001 00000040 fc648040 00000040 fc649780 [fc5af970] 00000040 00000001 00000040 fc648040 00000040 fc649780 00000040 fc640810 fc640810 00000810 fc640810 ffffff04 [fc5b5e04] 00000808 00000246 [fc5b5e40] fc648040 004c4b40 ffffffff 655f6464 7972746e 5f636100 656d6974 61007372 69745f63 5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172 On Mon, 23 Feb 2004, Kip Macy wrote:> > > > > This is a Xen crash dump. ksymoops won''t help -- you''ll need to map > > the crash dump to Xen code by hand. It doesn''t take long. The > > addresses in the stack trace that are enclosed in square brackets are > > likely to be return addresses in the function-call trace. > > This is sufficiently tedious that if this happens again I''m going to > either run screaming or write a ksymoops for xen. > > > > > ''objdump -d xen >xen.s''. Then you can search in xen.s with a text > > editor to find the call-trace addresses. > > I did this and got what you see below. It looks like to backtraces > interleaved. All of the values in brackets are legitimate return > addresses (they immediately follow a call instruction). "function addr" > is the address of the function itself and "ret addr" is the address > taken from the oops. > > function function addr ret addr > ===============================================> putchar fc5095be fc5095ef > e100_rx_srv fc532048 fc53240a > printf fc5095f7 fc509664 > putchar_serial fc50927c fc509299 > e100intr fc531d8f fc531ef0 > handle_IRQ_event fc5b1a25 fc5b1a7d > do_IRQ fc5b1bbb fc5b1c43 > call_do_IRQ fc5af4bb fc5af4c0 > serial_rx_int fc51801d fc518078 > serial_rx_int fc51801d fc518046 > handle_IRQ_event fc5b1a25 fc5b1a7d > reprogram_ac_timer fc5af087 fc5af0aa > do_IRQ fc5b1bbb fc5b1c43 > ac_timer_softirq_action fc50455c fc50465b > call_do_IRQ fc5af4bb fc5af4c0 > default_idle fc5b585c fc5b582e > continue_cpu_idle_loop fc5b585f fc5b5898 > > > The fault instruction is this: > fc532927: 66 83 38 00 cmpw $0x0,(%eax) > It is in e100_start_ru. Obviously eax is pointing at some piece of > unmapped memory. I''m not sufficiently versed in assembler, particularly > optimized, to tell where in we are going wrong: > > > list_for_each(entry_ptr, &(bdp->active_rx_list)) { > rx_struct > list_entry(entry_ptr, struct rx_list_elem, list_elem); > pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr, > bdp->rfd_size, PCI_DMA_FROMDEVICE); > if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) & > __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) { > buffer_found = 1; > break; > } > } > > Could the list have been corrupted? > > > -Kip > > > > > > -- Keir > > > > > After a few more minutes the following popped out on the console: > > > > > > CPU: 1 > > > EIP: 0808:[<fc532927>] > > > EFLAGS: 00010206 > > > eax: 0a725012 ebx: 00000010 ecx: fc657560 edx: fc76a460 > > > esi: fc76a460 edi: fc657540 ebp: 00000000 esp: fc64fda0 > > > ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 > > > Stack trace from ESP=fc64fda0: > > > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a] > > > fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e > > > fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0] > > > fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740 > > > 04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670 > > > 33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43] > > > 00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0 > > > 0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0 > > > 00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10 > > > [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d > > > 00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001 > > > fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200 > > > 00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004 > > > fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040 > > > 00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040 > > > fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c] > > > 00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63 > > > 5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172 > > > 636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040 > > > > > > **************************************** > > > CPU1 FATAL PAGE FAULT > > > [error_code=00000000] > > > Faulting linear address might be 0a725012 > > > Aieee! CPU1 is toast... > > > **************************************** > > > > > > Is this oops from Xen or from XenoLinux? I downloaded the latest > > > ksymoops and did the following: > > > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt > > > ksymoops 2.4.9 on i686 2.4.25-xeno. Options used > > > -v ../xenolinux-2.4.25/vmlinux (specified) > > > -k /proc/ksyms (default) > > > -l /proc/modules (default) > > > -o /lib/modules/2.4.25-xeno/ (default) > > > -m ../xenolinux-2.4.25/System.map (specified) > > > > > > No modules in ksyms, skipping objects > > > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid > > > lsmod file? > > > Warning (compare_maps): mismatch on symbol state d, System.map says > > > c0175ca8, vmlinux says 0. Ignoring System.map entry > > > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0, > > > System.map says c0175ca8. Ignoring System.map entry > > > CPU: 1 > > > EIP: 0808:[<fc532927>] > > > Using defaults from ksymoopsSegmentation fault > > > > > > > > > -Kip > > > > > > On Mon, 23 Feb 2004, Kip Macy wrote: > > > > > > > I had just tested my domain builder for the nth time on xeno-unstable > > > > (very latest source), when I saw the messages below on the console. > > > > DOM0 no longer responds to ping - I''m hoping that it will recover, > > > > however, in all likelihood I will be hitting the rpb in a few minutes. > > > > > > > > audit_all_pages > > > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000 > > > > refcount error: pfn=000000 cf=fffffffd refcount=1 > > > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0 > > > > > > > > refcount error: pfn=000247 cf=00000001 refcount=0 > > > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > > > > > refcount error: pfn=00024d cf=00000001 refcount=0 > > > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040 > > > > > > > > refcount error: pfn=00036f cf=40000002 refcount=1 > > > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > > pte_idx=3f9 *pte_idx=0036f063 > > > > > > > > refcount error: pfn=000371 cf=40000002 refcount=1 > > > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > > pte_idx=3fe *pte_idx=00371063 > > > > > > > > refcount error: pfn=000372 cf=40000002 refcount=1 > > > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0 > > > > pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0 > > > > pte_idx=3fd *pte_idx=00372063 > > > > > > > > refcount error: pfn=000390 cf=00000001 refcount=0 > > > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > > > > > refcount error: pfn=000392 cf=00000001 refcount=0 > > > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780 > > > > > > > > refcount error: pfn=000393 cf=00000001 refcount=0 > > > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320 > > > > > > > > refcount error: pfn=000395 cf=00000001 refcount=0 > > > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320 > > > > > > > > refcount error: pfn=00039f cf=00000001 refcount=0 > > > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > > > > > refcount error: pfn=0003a1 cf=00000001 refcount=0 > > > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0 > > > > > > > > refcount error: pfn=0003a2 cf=00000001 refcount=0 > > > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > > > > > refcount error: pfn=0003a8 cf=00000001 refcount=0 > > > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060 > > > > > > > > refcount error: pfn=0003a9 cf=00000001 refcount=0 > > > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > > > > > refcount error: pfn=0003ab cf=00000001 refcount=0 > > > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00 > > > > > > > > refcount error: pfn=0003ac cf=00000001 refcount=0 > > > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > > > > > refcount error: pfn=0003ae cf=00000001 refcount=0 > > > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0 > > > > > > > > refcount error: pfn=0003af cf=00000001 refcount=0 > > > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340 > > > > > > > > refcount error: pfn=0003b1 cf=00000001 refcount=0 > > > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340 > > > > > > > > refcount error: pfn=0003b2 cf=00000001 refcount=0 > > > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > > > refcount error: pfn=0003b4 cf=00000001 refcount=0 > > > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > > > Build and deploy apps & Web services for Linux with > > > > a free DVD software kit from IBM. Click Now! > > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > > > _______________________________________________ > > > > Xen-devel mailing list > > > > Xen-devel@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > > > > > > > > ------------------------------------------------------- > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > > Build and deploy apps & Web services for Linux with > > > a free DVD software kit from IBM. Click Now! > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > > > ------------------------------------------------------- > > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > > Build and deploy apps & Web services for Linux with > > a free DVD software kit from IBM. Click Now! > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2004-Feb-24 08:15 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
> It happened again. Is it possible that Xen isn''t disabling network > interrupts while it is "auditing all pages"?Quite possibly. The auditing code was added fairly recently specifically to assist debugging of a guest OS that was internally ''leaking'' references to pages. Its not in any of the non-debug builds, and is not well tested. In the circumstance we were using it the problem with the guestOS was rather subtle and just a couple of pages were failing the audit and generating log messages. (The audit code gets called when the guest does something ''weird'', or when you invoke the appropriate keyboard handler) If your machine has lots of physical memory, the auditing will take some time, and I''m not surprised that its causing problems. If its not helping you, just comment it out, or invoke it via the keyboard handler when you want it. Cheers, Ian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-Feb-24 08:35 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
> > It happened again. Is it possible that Xen isn''t disabling network > > interrupts while it is "auditing all pages"? > > Quite possibly. The auditing code was added fairly recently > specifically to assist debugging of a guest OS that was > internally ''leaking'' references to pages. > > Its not in any of the non-debug builds, and is not well > tested. In the circumstance we were using it the problem with the > guestOS was rather subtle and just a couple of pages were failing > the audit and generating log messages. (The audit code gets > called when the guest does something ''weird'', or when you invoke > the appropriate keyboard handler) > > If your machine has lots of physical memory, the auditing will > take some time, and I''m not surprised that its causing > problems. If its not helping you, just comment it out, or invoke > it via the keyboard handler when you want it.In fact the auditing code is only ever invoked in response to keyboard or serial input. Just avoid pressing the ''m'' key. :-) -- Keir ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-Feb-24 08:40 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
> > ''objdump -d xen >xen.s''. Then you can search in xen.s with a text > > editor to find the call-trace addresses. > > I did this and got what you see below. It looks like to backtraces > interleaved. All of the values in brackets are legitimate return > addresses (they immediately follow a call instruction). "function addr" > is the address of the function itself and "ret addr" is the address > taken from the oops.Yeah, we cannot precisly print the call trace because we build Xen with ''-fomit-frame-pointer'', even when doing a debug build. If we included a frame pointer then we could "chase" register %ebp to find the true call trace. As it is, we just look at the entire stack contents and enclose in square brackets any value that could correspond to an address between labels ''_start'' and ''_end'' in Xen. The effect of this is that if you have stale return addresses on your stack then they get included on the approximate call trace. These stale addresses may occur because the stack frame was popped, then another functyion invocation has pushed itself a large stack frame that encompasses teh stale one, but hasn''t blown away all of the stale contents. -- Keir ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-Feb-24 17:21 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
Or accidentally left-clicking on the console window with the mouse while trying to copy the output. Interesting side affect, several mouse clicks == reboot. -Kip On Tue, 24 Feb 2004, Keir Fraser wrote:> > > It happened again. Is it possible that Xen isn''t disabling network > > > interrupts while it is "auditing all pages"? > > > > Quite possibly. The auditing code was added fairly recently > > specifically to assist debugging of a guest OS that was > > internally ''leaking'' references to pages. > > > > Its not in any of the non-debug builds, and is not well > > tested. In the circumstance we were using it the problem with the > > guestOS was rather subtle and just a couple of pages were failing > > the audit and generating log messages. (The audit code gets > > called when the guest does something ''weird'', or when you invoke > > the appropriate keyboard handler) > > > > If your machine has lots of physical memory, the auditing will > > take some time, and I''m not surprised that its causing > > problems. If its not helping you, just comment it out, or invoke > > it via the keyboard handler when you want it. > > In fact the auditing code is only ever invoked in response to keyboard > or serial input. Just avoid pressing the ''m'' key. :-) > > -- Keir > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2004-Feb-24 17:45 UTC
Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
> Or accidentally left-clicking on the console window with the mouse while > trying to copy the output. Interesting side affect, several mouse clicks > == reboot.That would be a capital ''R'' ;-) Hit ''h'' for help : ''h'' pressed -> showing installed handlers key ''B'' (ascii ''42'') => reboot machine gracefully key ''L'' (ascii ''4c'') => reset sched latency histogram key ''P'' (ascii ''50'') => reset performance counters key ''R'' (ascii ''52'') => reboot machine ungracefully key ''a'' (ascii ''61'') => dump ac_timer queues key ''b'' (ascii ''62'') => dump xen ide blkdev statistics key ''d'' (ascii ''64'') => dump registers key ''h'' (ascii ''68'') => show this message key ''l'' (ascii ''6c'') => print sched latency histogram key ''p'' (ascii ''70'') => print performance counters key ''q'' (ascii ''71'') => dump task queues + guest state key ''r'' (ascii ''72'') => dump run queues key ''~'' (ascii ''7e'') => toggle serial echo This is from a 1.2 build. The 1.3 build has more debug handlers. They''re available over the serial line, or by pressing scroll lock-and-key on the keyboard (even in production builds). We''ve found them very useful. Best, Ian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel