James Harper
2011-Jan-25 04:20 UTC
[Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
Under the latest xen-4.0-testing, xm save + xm restore of Windows 2008 causes a BSoD on restore. This is without any PV drivers loaded. I haven''t tested any other versions of windows. According to the debugger when I did have PV drivers loaded, the crash happens on the return to userspace. I haven''t looked at the debugger with no PV drivers loaded. The bug check is 0x7F (0xD, 0, 0, 0) which is an unspecified exception. Is anyone else able to reproduce? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 09:24 UTC
Re: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
At 04:20 +0000 on 25 Jan (1295929205), James Harper wrote:> Under the latest xen-4.0-testing, xm save + xm restore of Windows 2008 > causes a BSoD on restore. This is without any PV drivers loaded. I > haven''t tested any other versions of windows. > > According to the debugger when I did have PV drivers loaded, the crash > happens on the return to userspace. I haven''t looked at the debugger > with no PV drivers loaded. > > The bug check is 0x7F (0xD, 0, 0, 0) which is an unspecified exception. > > Is anyone else able to reproduce?I saw a failure like this last month but when I came to look at it again it seemed to be working. Are you testing on Intel or AMD hardware? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 09:28 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
> > At 04:20 +0000 on 25 Jan (1295929205), James Harper wrote: > > Under the latest xen-4.0-testing, xm save + xm restore of Windows2008> > causes a BSoD on restore. This is without any PV drivers loaded. I > > haven''t tested any other versions of windows. > > > > According to the debugger when I did have PV drivers loaded, thecrash> > happens on the return to userspace. I haven''t looked at the debugger > > with no PV drivers loaded. > > > > The bug check is 0x7F (0xD, 0, 0, 0) which is an unspecifiedexception.> > > > Is anyone else able to reproduce? > > I saw a failure like this last month but when I came to look at itagain> it seemed to be working. Are you testing on Intel or AMD hardware? >AMD (cpuinfo below). I''ve just spent the last few hours updating my Intel box to test on that too. Save/resume was working but was only 3.4.1 so isn''t really a comparison. Both are now same xen, tools, and kernel. I should know if the problem exists on intel very soon. I''m guessing not or I wouldn''t be the first one posting about it... either that or there is something else going on. James (cpuinfo from in Dom0) processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : Dual-Core AMD Opteron(tm) Processor 1210 stepping : 3 cpu MHz : 1799.999 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good extd_apicid pni cx16 hypervisor lahf_lm cmp_legacy extapic cr8_legacy bogomips : 3599.99 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 10:39 UTC
Re: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
At 09:28 +0000 on 25 Jan (1295947707), James Harper wrote:> > > The bug check is 0x7F (0xD, 0, 0, 0) which is an unspecified exception.It''s a GPF, not that that helps a lot.> > > Is anyone else able to reproduce? > > > > I saw a failure like this last month but when I came to look at it again > > it seemed to be working. Are you testing on Intel or AMD hardware? > > > > AMD (cpuinfo below). I''ve just spent the last few hours updating my > Intel box to test on that too. Save/resume was working but was only > 3.4.1 so isn''t really a comparison. > > Both are now same xen, tools, and kernel. > > I should know if the problem exists on intel very soon. I''m guessing not > or I wouldn''t be the first one posting about it... either that or there > is something else going on.It was on AMD that I saw it too. If I read them correctly, Ian''s regression tests are passing HVM save/restore for at least some Windows versions on AMD too, so it may be very specific. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 10:43 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
> > I should know if the problem exists on intel very soon. I''m guessingnot> > or I wouldn''t be the first one posting about it... either that orthere> > is something else going on. > > It was on AMD that I saw it too. If I read them correctly, Ian''s > regression tests are passing HVM save/restore for at least someWindows> versions on AMD too, so it may be very specific. >Definitely AMD specific. Works fine on my Intel system. I''m guessing it''s missing the save and/or restore of some critical part of the CPU state for the domain that causes an immediate crash when return to user mode. Any suggestions as to where to start looking? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 10:53 UTC
Re: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
At 10:43 +0000 on 25 Jan (1295952215), James Harper wrote:> > > I should know if the problem exists on intel very soon. I''m guessing > not > > > or I wouldn''t be the first one posting about it... either that or > there > > > is something else going on. > > > > It was on AMD that I saw it too. If I read them correctly, Ian''s > > regression tests are passing HVM save/restore for at least some > Windows > > versions on AMD too, so it may be very specific. > > > > Definitely AMD specific. Works fine on my Intel system. > > I''m guessing it''s missing the save and/or restore of some critical part > of the CPU state for the domain that causes an immediate crash when > return to user mode. Any suggestions as to where to start looking?I''m trying to set it up here as well but I''m away from the office and getting the VGA console as far as my screen is proving tricky. Can you try: - xl pause <domid> - xen-hvmctx <domid> >before - xl save <domid> save-file - xl restore -p save-file - xl list - xen-hvmctx <new-domid> >after - diff -u before after There should be a few differences to do with timers and TSCs but there might be some other smoking gun. Of course it''s possible that some piece of state got added that didn''t get into the save/restore code at all. It''s also possible that some vital piece of memory isn''t getting saved properly but that''s less likely to be AMD-specific. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 11:01 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
> > At 10:43 +0000 on 25 Jan (1295952215), James Harper wrote: > > > > I should know if the problem exists on intel very soon. I''mguessing> > not > > > > or I wouldn''t be the first one posting about it... either thator> > there > > > > is something else going on. > > > > > > It was on AMD that I saw it too. If I read them correctly, Ian''s > > > regression tests are passing HVM save/restore for at least some > > Windows > > > versions on AMD too, so it may be very specific. > > > > > > > Definitely AMD specific. Works fine on my Intel system. > > > > I''m guessing it''s missing the save and/or restore of some criticalpart> > of the CPU state for the domain that causes an immediate crash when > > return to user mode. Any suggestions as to where to start looking? > > I''m trying to set it up here as well but I''m away from the office and > getting the VGA console as far as my screen is proving tricky. > > Can you try: > - xl pause <domid> > - xen-hvmctx <domid> >before > - xl save <domid> save-file > - xl restore -p save-file > - xl list > - xen-hvmctx <new-domid> >after > - diff -u before after > > There should be a few differences to do with timers and TSCs but there > might be some other smoking gun. Of course it''s possible that some > piece of state got added that didn''t get into the save/restore code at > all. It''s also possible that some vital piece of memory isn''t getting > saved properly but that''s less likely to be AMD-specific. >I''ll give it a go. I tried using some of the xl tools the other day but it wasn''t working. xl save seemed to go okay but xl restore just gave errors. That might be a clue in itself though... is xl considered stable for 4.0.2? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 11:12 UTC
Re: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
At 11:01 +0000 on 25 Jan (1295953315), James Harper wrote:> > > > At 10:43 +0000 on 25 Jan (1295952215), James Harper wrote: > > > > > I should know if the problem exists on intel very soon. I''m > guessing > > > not > > > > > or I wouldn''t be the first one posting about it... either that > or > > > there > > > > > is something else going on. > > > > > > > > It was on AMD that I saw it too. If I read them correctly, Ian''s > > > > regression tests are passing HVM save/restore for at least some > > > Windows > > > > versions on AMD too, so it may be very specific. > > > > > > > > > > Definitely AMD specific. Works fine on my Intel system. > > > > > > I''m guessing it''s missing the save and/or restore of some critical > part > > > of the CPU state for the domain that causes an immediate crash when > > > return to user mode. Any suggestions as to where to start looking? > > > > I''m trying to set it up here as well but I''m away from the office and > > getting the VGA console as far as my screen is proving tricky. > > > > Can you try: > > - xl pause <domid> > > - xen-hvmctx <domid> >before > > - xl save <domid> save-file > > - xl restore -p save-file > > - xl list > > - xen-hvmctx <new-domid> >after > > - diff -u before after > > > > There should be a few differences to do with timers and TSCs but there > > might be some other smoking gun. Of course it''s possible that some > > piece of state got added that didn''t get into the save/restore code at > > all. It''s also possible that some vital piece of memory isn''t getting > > saved properly but that''s less likely to be AMD-specific. > > > > I''ll give it a go. I tried using some of the xl tools the other day but > it wasn''t working. xl save seemed to go okay but xl restore just gave > errors. That might be a clue in itself though... is xl considered stable > for 4.0.2?For 4.0.x, I think you can substitute ''xm'' for ''xl'' throughout. :) Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 11:24 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
> Can you try: > - xl pause <domid> > - xen-hvmctx <domid> >before > - xl save <domid> save-file > - xl restore -p save-file > - xl list > - xen-hvmctx <new-domid> >after > - diff -u before after > > There should be a few differences to do with timers and TSCs but there > might be some other smoking gun. Of course it''s possible that some > piece of state got added that didn''t get into the save/restore code at > all. It''s also possible that some vital piece of memory isn''t getting > saved properly but that''s less likely to be AMD-specific. >xl just isn''t working for me. xl create doesn''t seem to work with drbd, and even when I use the /dev/ path of the disk image and make it primary myself, xl create gives me a domain that won''t boot (bug check 0x5C HAL_INITIALIZATION_FAILED). And even when I start the domain with xm, xl save works but xl restore won''t work. First you need to specify the config file or it complains about a missing userdata-d-<guid?).xl file in /var/lib/xen, and even when you specify the config file it says: Failed allocation for dom 50: 1024 extents of order 0 ERROR Internal error: Failed to allocate memory for batch.! xm save won''t work if the domu is paused. I''ll see if that''s just an artificial limitation I can remove... James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 11:37 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit (4.0.2-rc2-pre)
> I''m trying to set it up here as well but I''m away from the office and > getting the VGA console as far as my screen is proving tricky. > > Can you try: > - xl pause <domid> > - xen-hvmctx <domid> >before > - xl save <domid> save-file > - xl restore -p save-file > - xl list > - xen-hvmctx <new-domid> >after > - diff -u before after > > There should be a few differences to do with timers and TSCs but there > might be some other smoking gun. Of course it''s possible that some > piece of state got added that didn''t get into the save/restore code at > all. It''s also possible that some vital piece of memory isn''t getting > saved properly but that''s less likely to be AMD-specific. >I was able to remove the ''is domain running?'' check from xend and complete your request using xm. # diff -u before after --- before 2011-01-25 22:27:51.064451527 +1100 +++ after 2011-01-25 22:33:25.724619490 +1100 @@ -1,4 +1,4 @@ -HVM save record for domain 53 +HVM save record for domain 54 Entry 0: type 1 instance 0, length 24 Header: magic 0x54381286, version 1 Xen changeset 0 @@ -22,11 +22,11 @@ cs 0x0000001b (0x0000000000000000 + 0xffffffff / 0x00cfb) ds 0x00000023 (0x0000000000000000 + 0xffffffff / 0x00cf3) es 0x00000023 (0x0000000000000000 + 0xffffffff / 0x00cf3) - fs 0x0000003b (0x000000007ffdc000 + 0x00000fff / 0x004f3) - gs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00000) + fs 0x00000000 (0x00007f18bcbc6700 + 0xffffffff / 0x00000) + gs 0x00000000 (0xffff880028038000 + 0xffffffff / 0x00000) ss 0x00000023 (0x0000000000000000 + 0xffffffff / 0x00cf3) - tr 0x00000028 (0x0000000080157000 + 0x000020ab / 0x0008b) - ldtr 0x00000000 (0x0000000000000000 + 0x00000000 / 0x00000) + tr 0x0000e040 (0xffff82c480263a80 + 0x00000067 / 0x0008b) + ldtr 0x00000000 (0x0000000000000000 + 0x0000ffff / 0x00000) itdr (0x0000000081fff400 + 0x000007ff) gdtr (0x0000000081fff000 + 0x000003ff) sysenter cs 0x00000000 eip 0x0000000000000000 esp 0x0000000000000000 @@ -34,7 +34,7 @@ MSR flags 0xffffffffffffffff lstar 0x0000000000000000 star 0x0000000000000000 cstar 0x0000000000000000 sfmask 0x0000000000000000 efer 0x0000000000000800 - tsc 0x00000018cad69045 + tsc 0x0000008fe39fad26 event 0x00000000 error 0x00000000 FPU: fcw 0x037f fsw 0x0000 ftw 0x00 (0x00) fop 0x0000 @@ -71,7 +71,7 @@ (0x00000000000000000000000000000000) (0x00000000000000000000000000000000) Entry 2: type 3 instance 0, length 8 - PIC: IRQ base 0x30, irr 0, imr 0xff, isr 0 + PIC: IRQ base 0x30, irr 0x2, imr 0xff, isr 0 init_state 0, priority_add 0, readsel_isr 0, poll 0 auto_eoi 0, rotate_on_auto_eoi 0 special_fully_nested_mode 0, special_mask_mode 0 @@ -153,8 +153,8 @@ 0x01c0: 0x0000000000000004 0x01d0: 0x0000000000000000 0x01e0: 0x0000000000000000 0x01f0: 0x0000000000000000 0x0200: 0x0000000000000000 0x0210: 0x0000000000000000 - 0x0220: 0x0000000000000000 0x0230: 0x0000000000040000 - 0x0240: 0x0000000000000004 0x0250: 0x0000000000000000 + 0x0220: 0x0000000000000000 0x0230: 0x0000000000060000 + 0x0240: 0x0000000000000006 0x0250: 0x0000000000000000 0x0260: 0x0000000000000000 0x0270: 0x0000000000000000 0x0280: 0x0000000000000000 0x0290: 0x0000000000000000 0x02a0: 0x0000000000000000 0x02b0: 0x0000000000000000 @@ -171,7 +171,7 @@ Entry 7: type 7 instance 0, length 16 PCI IRQs: 0x00000000000100800000000000000000 Entry 8: type 8 instance 0, length 8 - ISA IRQs: 0x0001 + ISA IRQs: 0x0003 Entry 9: type 9 instance 0, length 8 PCI LINK: 0 0 0 0 Entry 10: type 10 instance 0, length 56 @@ -185,11 +185,11 @@ rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 mode 0xff, bcd 0, gate 0x1 Entry 11: type 11 instance 0, length 16 - RTC: regs 0x48 0x00 0x27 0x00 0x22 0x00 0x02 0x25 + RTC: regs 0x23 0x00 0x33 0x00 0x22 0x00 0x02 0x25 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c Entry 12: type 12 instance 0, length 1048 HPET: capability 0xf424008086a201 config 0 - isr 0 counter 0x1308081db + isr 0 counter 0x55323282f timer0 config 0xf0000000000030 cmp 0 timer0 period 0 fsb 0 timer1 config 0xf0000000000030 cmp 0 @@ -200,8 +200,8 @@ ACPI PM: TMR_VAL 0xd9f446d, PM1a_STS 0x0, PM1a_EN 0x321 Entry 14: type 14 instance 0, length 240 MTRR: PAT 0x7010600070106, cap 0x508, default 0xc06 - var 0 0x00000000f0000000 0x000000fff8000800 - var 1 0x00000000f8000000 0x000000fffc000800 + var 0 0x00000000f0000000 0x0000000000000000 + var 1 0x00000000f8000000 0x0000000000000000 var 2 0x0000000000000000 0x0000000000000000 var 3 0x0000000000000000 0x0000000000000000 var 4 0x0000000000000000 0x0000000000000000 I don''t really know a lot of what I''m looking at, but those cpu registers shouldn''t be different should they??? It almost looks like the cpu was allowed to run for a few cycles at some point even though it was paused. I''ll try the same on the intel box and see what happens. Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 11:52 UTC
RE: [Xen-devel] xm save + restore crashes Windows 2008 32-bit(4.0.2-rc2-pre)
On intel we get a much more believable result: # diff -u before after --- before 2011-01-25 22:40:32.861270619 +1100 +++ after 2011-01-25 22:43:31.665271154 +1100 @@ -1,4 +1,4 @@ -HVM save record for domain 24 +HVM save record for domain 25 Entry 0: type 1 instance 0, length 24 Header: magic 0x54381286, version 1 Xen changeset 0 @@ -34,7 +34,7 @@ MSR flags 0x0000000000000000 lstar 0x0000000000000000 star 0x0000000000000000 cstar 0x0000000000000000 sfmask 0x0000000000000000 efer 0x0000000000000800 - tsc 0x0000002a2056c07e + tsc 0x0000007e31c3b3f3 event 0x00000000 error 0x00000000 FPU: fcw 0x027f fsw 0x0000 ftw 0x00 (0x00) fop 0x0000 @@ -185,11 +185,11 @@ rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 mode 0xff, bcd 0, gate 0x1 Entry 11: type 11 instance 0, length 16 - RTC: regs 0x32 0x00 0x40 0x00 0x22 0x00 0x02 0x25 + RTC: regs 0x29 0x00 0x43 0x00 0x22 0x00 0x02 0x25 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c Entry 12: type 12 instance 0, length 1048 HPET: capability 0xf424008086a201 config 0 - isr 0 counter 0x19b07289b + isr 0 counter 0x394b04f80 timer0 config 0xf0000000000030 cmp 0 timer0 period 0 fsb 0 timer1 config 0xf0000000000030 cmp 0 just a few counters changed. I retried under AMD and got the same result as last time so it''s definitely broken. Are there any tools to analyse the save file? If I can see what numbers in there I should be able to tell if it''s the save or the restore that''s broken... Thanks James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Tuesday, 25 January 2011 22:38 > To: Tim Deegan > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-> rc2-pre) > > > I''m trying to set it up here as well but I''m away from the officeand> > getting the VGA console as far as my screen is proving tricky. > > > > Can you try: > > - xl pause <domid> > > - xen-hvmctx <domid> >before > > - xl save <domid> save-file > > - xl restore -p save-file > > - xl list > > - xen-hvmctx <new-domid> >after > > - diff -u before after > > > > There should be a few differences to do with timers and TSCs butthere> > might be some other smoking gun. Of course it''s possible that some > > piece of state got added that didn''t get into the save/restore codeat> > all. It''s also possible that some vital piece of memory isn''tgetting> > saved properly but that''s less likely to be AMD-specific. > > > > I was able to remove the ''is domain running?'' check from xend and > complete your request using xm. > > # diff -u before after > --- before 2011-01-25 22:27:51.064451527 +1100 > +++ after 2011-01-25 22:33:25.724619490 +1100 > @@ -1,4 +1,4 @@ > -HVM save record for domain 53 > +HVM save record for domain 54 > Entry 0: type 1 instance 0, length 24 > Header: magic 0x54381286, version 1 > Xen changeset 0 > @@ -22,11 +22,11 @@ > cs 0x0000001b (0x0000000000000000 + 0xffffffff /0x00cfb)> ds 0x00000023 (0x0000000000000000 + 0xffffffff /0x00cf3)> es 0x00000023 (0x0000000000000000 + 0xffffffff /0x00cf3)> - fs 0x0000003b (0x000000007ffdc000 + 0x00000fff /0x004f3)> - gs 0x00000000 (0x0000000000000000 + 0xffffffff /0x00000)> + fs 0x00000000 (0x00007f18bcbc6700 + 0xffffffff /0x00000)> + gs 0x00000000 (0xffff880028038000 + 0xffffffff /0x00000)> ss 0x00000023 (0x0000000000000000 + 0xffffffff /0x00cf3)> - tr 0x00000028 (0x0000000080157000 + 0x000020ab /0x0008b)> - ldtr 0x00000000 (0x0000000000000000 + 0x00000000 /0x00000)> + tr 0x0000e040 (0xffff82c480263a80 + 0x00000067 /0x0008b)> + ldtr 0x00000000 (0x0000000000000000 + 0x0000ffff /0x00000)> itdr (0x0000000081fff400 + 0x000007ff) > gdtr (0x0000000081fff000 + 0x000003ff) > sysenter cs 0x00000000 eip 0x0000000000000000 esp > 0x0000000000000000 > @@ -34,7 +34,7 @@ > MSR flags 0xffffffffffffffff lstar 0x0000000000000000 > star 0x0000000000000000 cstar 0x0000000000000000 > sfmask 0x0000000000000000 efer 0x0000000000000800 > - tsc 0x00000018cad69045 > + tsc 0x0000008fe39fad26 > event 0x00000000 error 0x00000000 > FPU: fcw 0x037f fsw 0x0000 > ftw 0x00 (0x00) fop 0x0000 > @@ -71,7 +71,7 @@ > (0x00000000000000000000000000000000) > (0x00000000000000000000000000000000) > Entry 2: type 3 instance 0, length 8 > - PIC: IRQ base 0x30, irr 0, imr 0xff, isr 0 > + PIC: IRQ base 0x30, irr 0x2, imr 0xff, isr 0 > init_state 0, priority_add 0, readsel_isr 0, poll 0 > auto_eoi 0, rotate_on_auto_eoi 0 > special_fully_nested_mode 0, special_mask_mode 0 > @@ -153,8 +153,8 @@ > 0x01c0: 0x0000000000000004 0x01d0: 0x0000000000000000 > 0x01e0: 0x0000000000000000 0x01f0: 0x0000000000000000 > 0x0200: 0x0000000000000000 0x0210: 0x0000000000000000 > - 0x0220: 0x0000000000000000 0x0230: 0x0000000000040000 > - 0x0240: 0x0000000000000004 0x0250: 0x0000000000000000 > + 0x0220: 0x0000000000000000 0x0230: 0x0000000000060000 > + 0x0240: 0x0000000000000006 0x0250: 0x0000000000000000 > 0x0260: 0x0000000000000000 0x0270: 0x0000000000000000 > 0x0280: 0x0000000000000000 0x0290: 0x0000000000000000 > 0x02a0: 0x0000000000000000 0x02b0: 0x0000000000000000 > @@ -171,7 +171,7 @@ > Entry 7: type 7 instance 0, length 16 > PCI IRQs: 0x00000000000100800000000000000000 > Entry 8: type 8 instance 0, length 8 > - ISA IRQs: 0x0001 > + ISA IRQs: 0x0003 > Entry 9: type 9 instance 0, length 8 > PCI LINK: 0 0 0 0 > Entry 10: type 10 instance 0, length 56 > @@ -185,11 +185,11 @@ > rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 > mode 0xff, bcd 0, gate 0x1 > Entry 11: type 11 instance 0, length 16 > - RTC: regs 0x48 0x00 0x27 0x00 0x22 0x00 0x02 0x25 > + RTC: regs 0x23 0x00 0x33 0x00 0x22 0x00 0x02 0x25 > 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c > Entry 12: type 12 instance 0, length 1048 > HPET: capability 0xf424008086a201 config 0 > - isr 0 counter 0x1308081db > + isr 0 counter 0x55323282f > timer0 config 0xf0000000000030 cmp 0 > timer0 period 0 fsb 0 > timer1 config 0xf0000000000030 cmp 0 > @@ -200,8 +200,8 @@ > ACPI PM: TMR_VAL 0xd9f446d, PM1a_STS 0x0, PM1a_EN 0x321 > Entry 14: type 14 instance 0, length 240 > MTRR: PAT 0x7010600070106, cap 0x508, default 0xc06 > - var 0 0x00000000f0000000 0x000000fff8000800 > - var 1 0x00000000f8000000 0x000000fffc000800 > + var 0 0x00000000f0000000 0x0000000000000000 > + var 1 0x00000000f8000000 0x0000000000000000 > var 2 0x0000000000000000 0x0000000000000000 > var 3 0x0000000000000000 0x0000000000000000 > var 4 0x0000000000000000 0x0000000000000000 > > I don''t really know a lot of what I''m looking at, but those cpu > registers shouldn''t be different should they??? It almost looks likethe> cpu was allowed to run for a few cycles at some point even though itwas> paused. > > I''ll try the same on the intel box and see what happens. > > Thanks > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 13:35 UTC
RE: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-rc2-pre) (AMD only)
I put some printf''s around the restore of registers in hvm_load_cpu_ctxt. One before, announcing what the register was about to be set to, then set it as normal, then read it and print what it contains (which should be what it was set to). The values don''t match for fs, gs, tr, and ldtr. The value''s written do match what xen-hvmctx tells me before the save is done, so the save is working just not the restore. So the problem is somewhere past hvm_set_segment_register, and because it''s amd only, probably in or beyond svm_set_segment_register. The first thing I notice in that routine is that there is a case for those 4 registers... although all it seems to do is svm_sync_vmcb before and svm_vmload after setting. I don''t know what those two do though. I''ll investigate further tomorrow, assuming nobody fixes it while I''m asleep :) James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Tuesday, 25 January 2011 22:52 > To: Tim Deegan > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] xm save + restore crashes Windows200832-bit(4.0.2-> rc2-pre) > > On intel we get a much more believable result: > > # diff -u before after > --- before 2011-01-25 22:40:32.861270619 +1100 > +++ after 2011-01-25 22:43:31.665271154 +1100 > @@ -1,4 +1,4 @@ > -HVM save record for domain 24 > +HVM save record for domain 25 > Entry 0: type 1 instance 0, length 24 > Header: magic 0x54381286, version 1 > Xen changeset 0 > @@ -34,7 +34,7 @@ > MSR flags 0x0000000000000000 lstar 0x0000000000000000 > star 0x0000000000000000 cstar 0x0000000000000000 > sfmask 0x0000000000000000 efer 0x0000000000000800 > - tsc 0x0000002a2056c07e > + tsc 0x0000007e31c3b3f3 > event 0x00000000 error 0x00000000 > FPU: fcw 0x027f fsw 0x0000 > ftw 0x00 (0x00) fop 0x0000 > @@ -185,11 +185,11 @@ > rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 > mode 0xff, bcd 0, gate 0x1 > Entry 11: type 11 instance 0, length 16 > - RTC: regs 0x32 0x00 0x40 0x00 0x22 0x00 0x02 0x25 > + RTC: regs 0x29 0x00 0x43 0x00 0x22 0x00 0x02 0x25 > 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c > Entry 12: type 12 instance 0, length 1048 > HPET: capability 0xf424008086a201 config 0 > - isr 0 counter 0x19b07289b > + isr 0 counter 0x394b04f80 > timer0 config 0xf0000000000030 cmp 0 > timer0 period 0 fsb 0 > timer1 config 0xf0000000000030 cmp 0 > > just a few counters changed. > > I retried under AMD and got the same result as last time so it''s > definitely broken. > > Are there any tools to analyse the save file? If I can see whatnumbers> in there I should be able to tell if it''s the save or the restorethat''s> broken... > > Thanks > > James > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > > bounces@lists.xensource.com] On Behalf Of James Harper > > Sent: Tuesday, 25 January 2011 22:38 > > To: Tim Deegan > > Cc: xen-devel@lists.xensource.com > > Subject: RE: [Xen-devel] xm save + restore crashes Windows 2008 > 32-bit(4.0.2- > > rc2-pre) > > > > > I''m trying to set it up here as well but I''m away from the office > and > > > getting the VGA console as far as my screen is proving tricky. > > > > > > Can you try: > > > - xl pause <domid> > > > - xen-hvmctx <domid> >before > > > - xl save <domid> save-file > > > - xl restore -p save-file > > > - xl list > > > - xen-hvmctx <new-domid> >after > > > - diff -u before after > > > > > > There should be a few differences to do with timers and TSCs but > there > > > might be some other smoking gun. Of course it''s possible thatsome> > > piece of state got added that didn''t get into the save/restorecode> at > > > all. It''s also possible that some vital piece of memory isn''t > getting > > > saved properly but that''s less likely to be AMD-specific. > > > > > > > I was able to remove the ''is domain running?'' check from xend and > > complete your request using xm. > > > > # diff -u before after > > --- before 2011-01-25 22:27:51.064451527 +1100 > > +++ after 2011-01-25 22:33:25.724619490 +1100 > > @@ -1,4 +1,4 @@ > > -HVM save record for domain 53 > > +HVM save record for domain 54 > > Entry 0: type 1 instance 0, length 24 > > Header: magic 0x54381286, version 1 > > Xen changeset 0 > > @@ -22,11 +22,11 @@ > > cs 0x0000001b (0x0000000000000000 + 0xffffffff / > 0x00cfb) > > ds 0x00000023 (0x0000000000000000 + 0xffffffff / > 0x00cf3) > > es 0x00000023 (0x0000000000000000 + 0xffffffff / > 0x00cf3) > > - fs 0x0000003b (0x000000007ffdc000 + 0x00000fff / > 0x004f3) > > - gs 0x00000000 (0x0000000000000000 + 0xffffffff / > 0x00000) > > + fs 0x00000000 (0x00007f18bcbc6700 + 0xffffffff / > 0x00000) > > + gs 0x00000000 (0xffff880028038000 + 0xffffffff / > 0x00000) > > ss 0x00000023 (0x0000000000000000 + 0xffffffff / > 0x00cf3) > > - tr 0x00000028 (0x0000000080157000 + 0x000020ab / > 0x0008b) > > - ldtr 0x00000000 (0x0000000000000000 + 0x00000000 / > 0x00000) > > + tr 0x0000e040 (0xffff82c480263a80 + 0x00000067 / > 0x0008b) > > + ldtr 0x00000000 (0x0000000000000000 + 0x0000ffff / > 0x00000) > > itdr (0x0000000081fff400 + 0x000007ff) > > gdtr (0x0000000081fff000 + 0x000003ff) > > sysenter cs 0x00000000 eip 0x0000000000000000 esp > > 0x0000000000000000 > > @@ -34,7 +34,7 @@ > > MSR flags 0xffffffffffffffff lstar 0x0000000000000000 > > star 0x0000000000000000 cstar 0x0000000000000000 > > sfmask 0x0000000000000000 efer 0x0000000000000800 > > - tsc 0x00000018cad69045 > > + tsc 0x0000008fe39fad26 > > event 0x00000000 error 0x00000000 > > FPU: fcw 0x037f fsw 0x0000 > > ftw 0x00 (0x00) fop 0x0000 > > @@ -71,7 +71,7 @@ > > (0x00000000000000000000000000000000) > > (0x00000000000000000000000000000000) > > Entry 2: type 3 instance 0, length 8 > > - PIC: IRQ base 0x30, irr 0, imr 0xff, isr 0 > > + PIC: IRQ base 0x30, irr 0x2, imr 0xff, isr 0 > > init_state 0, priority_add 0, readsel_isr 0, poll 0 > > auto_eoi 0, rotate_on_auto_eoi 0 > > special_fully_nested_mode 0, special_mask_mode 0 > > @@ -153,8 +153,8 @@ > > 0x01c0: 0x0000000000000004 0x01d0: 0x0000000000000000 > > 0x01e0: 0x0000000000000000 0x01f0: 0x0000000000000000 > > 0x0200: 0x0000000000000000 0x0210: 0x0000000000000000 > > - 0x0220: 0x0000000000000000 0x0230: 0x0000000000040000 > > - 0x0240: 0x0000000000000004 0x0250: 0x0000000000000000 > > + 0x0220: 0x0000000000000000 0x0230: 0x0000000000060000 > > + 0x0240: 0x0000000000000006 0x0250: 0x0000000000000000 > > 0x0260: 0x0000000000000000 0x0270: 0x0000000000000000 > > 0x0280: 0x0000000000000000 0x0290: 0x0000000000000000 > > 0x02a0: 0x0000000000000000 0x02b0: 0x0000000000000000 > > @@ -171,7 +171,7 @@ > > Entry 7: type 7 instance 0, length 16 > > PCI IRQs: 0x00000000000100800000000000000000 > > Entry 8: type 8 instance 0, length 8 > > - ISA IRQs: 0x0001 > > + ISA IRQs: 0x0003 > > Entry 9: type 9 instance 0, length 8 > > PCI LINK: 0 0 0 0 > > Entry 10: type 10 instance 0, length 56 > > @@ -185,11 +185,11 @@ > > rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 > > mode 0xff, bcd 0, gate 0x1 > > Entry 11: type 11 instance 0, length 16 > > - RTC: regs 0x48 0x00 0x27 0x00 0x22 0x00 0x02 0x25 > > + RTC: regs 0x23 0x00 0x33 0x00 0x22 0x00 0x02 0x25 > > 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c > > Entry 12: type 12 instance 0, length 1048 > > HPET: capability 0xf424008086a201 config 0 > > - isr 0 counter 0x1308081db > > + isr 0 counter 0x55323282f > > timer0 config 0xf0000000000030 cmp 0 > > timer0 period 0 fsb 0 > > timer1 config 0xf0000000000030 cmp 0 > > @@ -200,8 +200,8 @@ > > ACPI PM: TMR_VAL 0xd9f446d, PM1a_STS 0x0, PM1a_EN 0x321 > > Entry 14: type 14 instance 0, length 240 > > MTRR: PAT 0x7010600070106, cap 0x508, default 0xc06 > > - var 0 0x00000000f0000000 0x000000fff8000800 > > - var 1 0x00000000f8000000 0x000000fffc000800 > > + var 0 0x00000000f0000000 0x0000000000000000 > > + var 1 0x00000000f8000000 0x0000000000000000 > > var 2 0x0000000000000000 0x0000000000000000 > > var 3 0x0000000000000000 0x0000000000000000 > > var 4 0x0000000000000000 0x0000000000000000 > > > > I don''t really know a lot of what I''m looking at, but those cpu > > registers shouldn''t be different should they??? It almost looks like > the > > cpu was allowed to run for a few cycles at some point even though it > was > > paused. > > > > I''ll try the same on the intel box and see what happens. > > > > Thanks > > > > James > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 14:37 UTC
Re: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-rc2-pre) (AMD only)
At 13:35 +0000 on 25 Jan (1295962540), James Harper wrote:> So the problem is somewhere past hvm_set_segment_register, and because > it''s amd only, probably in or beyond svm_set_segment_register. The first > thing I notice in that routine is that there is a case for those 4 > registers... although all it seems to do is svm_sync_vmcb before and > svm_vmload after setting. I don''t know what those two do though.Hmm; I suspect the bug here is actually in the save side -- the syncing of the vmcb in the save routine is not conditional on v == current, and the "already synced" bit that it would otherwise gate on isn''t properly initialized. Try the attached patch; I''m sorry to say that I suspect it will fix the odd output of xen_hvmctx but probably won''t fix the BSOD. :( Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 22:11 UTC
RE: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-rc2-pre) (AMD only)
> > At 13:35 +0000 on 25 Jan (1295962540), James Harper wrote: > > So the problem is somewhere past hvm_set_segment_register, andbecause> > it''s amd only, probably in or beyond svm_set_segment_register. Thefirst> > thing I notice in that routine is that there is a case for those 4 > > registers... although all it seems to do is svm_sync_vmcb before and > > svm_vmload after setting. I don''t know what those two do though. > > Hmm; I suspect the bug here is actually in the save side -- thesyncing> of the vmcb in the save routine is not conditional on v == current,and> the "already synced" bit that it would otherwise gate on isn''tproperly> initialized. > > Try the attached patch; I''m sorry to say that I suspect it will fixthe> odd output of xen_hvmctx but probably won''t fix the BSOD. :( >Just to clarify, in the restore path I print the values to be saved to the segment registers, then I read the segment registers and print the values that are in them. They aren''t the same. Doesn''t that sound like a problem on the restore side? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Jan-25 22:21 UTC
Re: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-rc2-pre) (AMD only)
At 22:11 +0000 on 25 Jan (1295993487), James Harper wrote:> > > > At 13:35 +0000 on 25 Jan (1295962540), James Harper wrote: > > > So the problem is somewhere past hvm_set_segment_register, and > because > > > it''s amd only, probably in or beyond svm_set_segment_register. The > first > > > thing I notice in that routine is that there is a case for those 4 > > > registers... although all it seems to do is svm_sync_vmcb before and > > > svm_vmload after setting. I don''t know what those two do though. > > > > Hmm; I suspect the bug here is actually in the save side -- the > syncing > > of the vmcb in the save routine is not conditional on v == current, > and > > the "already synced" bit that it would otherwise gate on isn''t > properly > > initialized. > > > > Try the attached patch; I''m sorry to say that I suspect it will fix > the > > odd output of xen_hvmctx but probably won''t fix the BSOD. :( > > > > Just to clarify, in the restore path I print the values to be saved to > the segment registers, then I read the segment registers and print the > values that are in them. They aren''t the same. Doesn''t that sound like a > problem on the restore side?That would depend on how you read the values after the restore - the patch is for a bug that I think is causing svm_get_segment_register() to corrupt the vmcb if it''s called before the vcpu is first scheduled (and to return the corrupted values). Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2011-Jan-25 22:25 UTC
RE: [Xen-devel] xm save + restore crashes Windows 200832-bit(4.0.2-rc2-pre) (AMD only)
> > > > Just to clarify, in the restore path I print the values to be savedto> > the segment registers, then I read the segment registers and printthe> > values that are in them. They aren''t the same. Doesn''t that soundlike a> > problem on the restore side? > > That would depend on how you read the values after the restore - the > patch is for a bug that I think is causing svm_get_segment_register()to> corrupt the vmcb if it''s called before the vcpu is first scheduled(and> to return the corrupted values). >I see. I just tested and while I still get the crash, all the segment registers are now correct after applying your patch. The only thing I can see that''s different now is the MTRR''s. James --- before 2011-01-26 09:16:19.030666000 +1100 +++ after 2011-01-26 09:21:13.374664075 +1100 @@ -1,4 +1,4 @@ -HVM save record for domain 4 +HVM save record for domain 6 Entry 0: type 1 instance 0, length 24 Header: magic 0x54381286, version 1 Xen changeset 0 @@ -34,7 +34,7 @@ MSR flags 0xffffffffffffffff lstar 0x0000000000000000 star 0x0000000000000000 cstar 0x0000000000000000 sfmask 0x0000000000000000 efer 0x0000000000000800 - tsc 0x000000172cbec19e + tsc 0x0000005866dd3e1f event 0x00000000 error 0x00000000 FPU: fcw 0x027f fsw 0x0000 ftw 0x00 (0x00) fop 0x0000 @@ -185,11 +185,11 @@ rd_state 0, wr_state 0, wr_latch 0, rw_mode 0 mode 0xff, bcd 0, gate 0x1 Entry 11: type 11 instance 0, length 16 - RTC: regs 0x16 0x00 0x16 0x00 0x09 0x00 0x03 0x26 + RTC: regs 0x12 0x00 0x21 0x00 0x09 0x00 0x03 0x26 0x01 0x11 0x2a 0x42 0x00 0x80, index 0x0c Entry 12: type 12 instance 0, length 1048 HPET: capability 0xf424008086a201 config 0 - isr 0 counter 0x1f65b81fc + isr 0 counter 0x43a264ae0 timer0 config 0xf0000000000030 cmp 0 timer0 period 0 fsb 0 timer1 config 0xf0000000000030 cmp 0 @@ -200,8 +200,8 @@ ACPI PM: TMR_VAL 0x19b239a8, PM1a_STS 0x0, PM1a_EN 0x321 Entry 14: type 14 instance 0, length 240 MTRR: PAT 0x7010600070106, cap 0x508, default 0xc06 - var 0 0x00000000f0000000 0x000000fff8000800 - var 1 0x00000000f8000000 0x000000fffc000800 + var 0 0x00000000f0000000 0x0000000000000000 + var 1 0x00000000f8000000 0x0000000000000000 var 2 0x0000000000000000 0x0000000000000000 var 3 0x0000000000000000 0x0000000000000000 var 4 0x0000000000000000 0x0000000000000000 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel