Hi folks, I''m busy rewriting the domain builder code a bit, to restruct the code and make it better usable for other tasks than directly booting a domain. While testing these bits I trapped into that one: (XEN) CPU: 1 (XEN) EIP: e008:[<ff137512>] get_page_type+0x12/0x63d (XEN) EFLAGS: 00010296 (XEN) CR3: 00000000 (XEN) eax: 33030001 ebx: ff1c1080 ecx: ff1d4080 edx: ff1d4080 (XEN) esi: 0000001a edi: ffbf5fac ebp: ffbf502c esp: ffbf4f84 (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 (XEN) ************************************ (XEN) CPU1 DOUBLE FAULT -- system shutdown (XEN) System needs manual reset. (XEN) ************************************ I think even Domain-0 shouldn''t be able to crash xen like this, no? cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 May 2006, at 16:00, Gerd Hoffmann wrote:> I''m busy rewriting the domain builder code a bit, to restruct the code > and make it better usable for other tasks than directly booting a > domain. While testing these bits I trapped into that one: > > (XEN) CPU: 1 > (XEN) EIP: e008:[<ff137512>] get_page_type+0x12/0x63d > (XEN) EFLAGS: 00010296 > (XEN) CR3: 00000000 > (XEN) eax: 33030001 ebx: ff1c1080 ecx: ff1d4080 edx: ff1d4080 > (XEN) esi: 0000001a edi: ffbf5fac ebp: ffbf502c esp: ffbf4f84 > (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 > (XEN) ************************************ > (XEN) CPU1 DOUBLE FAULT -- system shutdown > (XEN) System needs manual reset. > (XEN) ************************************ > > I think even Domain-0 shouldn''t be able to crash xen like this, no?Looks like a stack overflow, since the stack pointer is in an "even" page which is guard page when running a debug build of Xen. Maybe you could hack up some code to get a rough back trace (round the crashing stack pointer up to a page boundary then scan a whole page for text addresses)? Either need to fix some large stack frame or make the stack larger. Probably the former. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Looks suspiciously like a stack overflow (comparing esp and ebp) - did you perhaps add (for debugging) some large stack objects somewhere? Unfortunately the code isn''t clever enough to provide a stack trace in such a situation... Jan>>> Gerd Hoffmann <kraxel@suse.de> 29.05.06 17:00 >>>Hi folks, I''m busy rewriting the domain builder code a bit, to restruct the code and make it better usable for other tasks than directly booting a domain. While testing these bits I trapped into that one: (XEN) CPU: 1 (XEN) EIP: e008:[<ff137512>] get_page_type+0x12/0x63d (XEN) EFLAGS: 00010296 (XEN) CR3: 00000000 (XEN) eax: 33030001 ebx: ff1c1080 ecx: ff1d4080 edx: ff1d4080 (XEN) esi: 0000001a edi: ffbf5fac ebp: ffbf502c esp: ffbf4f84 (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 (XEN) ************************************ (XEN) CPU1 DOUBLE FAULT -- system shutdown (XEN) System needs manual reset. (XEN) ************************************ I think even Domain-0 shouldn''t be able to crash xen like this, no? cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Looks like a stack overflow, since the stack pointer is in an "even" > page which is guard page when running a debug build of Xen. Maybe you > could hack up some code to get a rough back trace (round the crashing > stack pointer up to a page boundary then scan a whole page for text > addresses)?Done, see attachments for results if someone wants have a quick look, I''ll continue debugging tomorrow. Nice guess btw, it really is a debug build ;) cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 May 2006, at 17:03, Gerd Hoffmann wrote:> Done, see attachments for results if someone wants have a quick look, > I''ll continue debugging tomorrow.You also want to take each stack value that is between _text and _end labels and call print_symbol() on it. That will give you a better impression of the backtrace quickly. If you also print the stack delta between each value that you pass to print_symbol() you''ll also see which are the stack frames that are really troublesome. The few stack frames you looked at already look quite innocent. They don''t take up much stack space. OTOH it is somewhat weird to be doing writable pagetable work that far down the stack. It''ll be interesting to see what was going on to cause writable pagetable state to be flushed. -- Keir> Nice guess btw, it really is a debug build ;)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> The few stack frames you looked at already look quite innocent. They > don''t take up much stack space. OTOH it is somewhat weird to be doing > writable pagetable work that far down the stack. It''ll be interesting to > see what was going on to cause writable pagetable state to be flushed.Looks like an endless recursion, trace (and patch) attached. cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30 May 2006, at 15:02, Gerd Hoffmann wrote:>> The few stack frames you looked at already look quite innocent. They >> don''t take up much stack space. OTOH it is somewhat weird to be doing >> writable pagetable work that far down the stack. It''ll be interesting >> to >> see what was going on to cause writable pagetable state to be flushed. > > Looks like an endless recursion, trace (and patch) attached.Looks like writable pagetable logic gets tangled up somehow. I''ll look into it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > On 30 May 2006, at 15:02, Gerd Hoffmann wrote: > > >> The few stack frames you looked at already look quite innocent. They > >> don''t take up much stack space. OTOH it is somewhat weird to be doing > >> writable pagetable work that far down the stack. It''ll be interesting > >> to > >> see what was going on to cause writable pagetable state to be flushed. > > > > Looks like an endless recursion, trace (and patch) attached. > > Looks like writable pagetable logic gets tangled up somehow. I''ll look > into it.Gerd, Can you please let me know whether the attached patch fixes the crash for you? I suspect a bug in your modified builder triggered a broken error path in Xen -- so this patch will hopefully turn the Xen crash into a failure of your modified builder. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Gerd, > > Can you please let me know whether the attached patch fixes the > crash for you? I suspect a bug in your modified builder triggered a > broken error path in Xen -- so this patch will hopefully turn the Xen > crash into a failure of your modified builder. :-)Yep, its fixed, now the newly created domain crashes, probably during kernels initial page table setup (pfn 0 looks like that), which likely is a builder bug. I get this now: (XEN) DOM1: (file=mm.c, line=1528) Bad type (saw 33030001 != exp e0000000) for mfn c72 (pfn 0) (XEN) DOM1: (file=mm.c, line=505) Error getting mfn c72 (pfn 0) from L1 entry 00c72063 for dom1 (XEN) DOM1: (file=mm.c, line=3054) ptwr: Could not revalidate l1 page (XEN) domain_crash called from mm.c:3055 (XEN) Domain 1 (vcpu#0) crashed on cpu#1: (XEN) ----[ Xen-3.0-unstable Not tainted ]---- (XEN) CPU: 1 (XEN) EIP: e019:[<c0101347>] (XEN) EFLAGS: 00000286 CONTEXT: guest [ ... ] cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel