All, We''ve finished a round of nightly testing and bug verification on RC4 (#18314). There are still 5 P1 open bugs. Bug #1322 and bug #1323 were found by extended testing with Solaris HVM guest. New P1 bugs: =============================================1. Xen HV crashes while booting up Indiana HVM guest http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1322 2. Booting Nevada 81 PAE HVM may cause Xen crash. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1323 Old P1 bugs: =============================================1. One 32e, hotplug attaching VT-d NIC to guest failed. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1316. 2. On PAE, failed to hotplug attach USB EHCI device to linux guest. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1318 3. UHCI hotplug can not work on Montevina platform. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1319 -- haicheng _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Do the new P1s still occur if you change SHADOW_OPTIMIZATIONS in arch/x86/mm/shadow/private.h to 0xff? (i.e., disable out-of-sync optimisation). -- Keir On 13/8/08 08:43, "Li, Haicheng" <haicheng.li@intel.com> wrote:> All, > > We''ve finished a round of nightly testing and bug verification on RC4 > (#18314). There are still 5 P1 open bugs. Bug #1322 and bug #1323 were found > by extended testing with Solaris HVM guest. > > New P1 bugs: > =============================================> 1. Xen HV crashes while booting up Indiana HVM guest > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1322 > > 2. Booting Nevada 81 PAE HVM may cause Xen crash. > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1323 > > Old P1 bugs: > =============================================> 1. One 32e, hotplug attaching VT-d NIC to guest failed. > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1316. > > 2. On PAE, failed to hotplug attach USB EHCI device to linux guest. > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1318 > > 3. UHCI hotplug can not work on Montevina platform. > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1319 > > > > -- haicheng > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It goes without saying of course that these new P1s are unfortunately rather likely to delay 3.3.0, unless we choose to disable optimisations causing these crashes. Still, we are in deep freeze mode and I won''t be taking any patches that aren''t obvious fixes for very serious issues and regressions. -- Keir On 13/8/08 09:30, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Do the new P1s still occur if you change SHADOW_OPTIMIZATIONS in > arch/x86/mm/shadow/private.h to 0xff? (i.e., disable out-of-sync > optimisation). > > -- Keir > > On 13/8/08 08:43, "Li, Haicheng" <haicheng.li@intel.com> wrote: > >> All, >> >> We''ve finished a round of nightly testing and bug verification on RC4 >> (#18314). There are still 5 P1 open bugs. Bug #1322 and bug #1323 were found >> by extended testing with Solaris HVM guest. >> >> New P1 bugs: >> =============================================>> 1. Xen HV crashes while booting up Indiana HVM guest >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1322 >> >> 2. Booting Nevada 81 PAE HVM may cause Xen crash. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1323 >> >> Old P1 bugs: >> =============================================>> 1. One 32e, hotplug attaching VT-d NIC to guest failed. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1316. >> >> 2. On PAE, failed to hotplug attach USB EHCI device to linux guest. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1318 >> >> 3. UHCI hotplug can not work on Montevina platform. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1319 >> >> >> >> -- haicheng >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday, August 13, 2008 4:31 PM xen-devel-bounces@lists.xensource.com wrote:> Do the new P1s still occur if you change SHADOW_OPTIMIZATIONS in > arch/x86/mm/shadow/private.h to 0xff? (i.e., disable out-of-sync > optimisation).After change SHADOW_OPTIMIZATIONS to 0xff, the two issues disappear. But we found two phenomena: 1. Indiana HVM may reboot when loading grub, serial grub shows: (XEN) sh error: sh_remove_shadows(): can''t find all shadows of mfn 1e4490 (shadow_flags=00000040) (XEN) domain_crash called from common.c:2714 (XEN) Domain 10 (vcpu#0) crashed on cpu#4: (XEN) ----[ Xen-3.3.0-rc1 x86_32p debug=n Not tainted ]---- (XEN) CPU: 4 (XEN) EIP: 0158:[<fa58e185>] (XEN) EFLAGS: 00000282 CONTEXT: hvm (XEN) eax: 0000000c ebx: 00000007 ecx: d947eba0 edx: 00000000 (XEN) esi: d947e7a0 edi: fa5886c0 ebp: d947e768 esp: d947e748 (XEN) cr0: 8005003b cr4: 000006b8 cr3: 023cb020 cr2: d32daea5 (XEN) ds: 0160 es: 0160 fs: 0000 gs: 01b0 ss: 0160 cs: 0158 (XEN) sh error: sh_remove_shadows(): can''t find all shadows of mfn 1e45de (shadow_flags=00000080) (XEN) domain_crash called from common.c:2714 2. "xm destroy" Indiana HVM may cause xen call trace. But there is no xen crash. (XEN) Xen WARN at domain.c:1814 (XEN) ----[ Xen-3.3.0-rc1 x86_32p debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) EIP: e008:[<ff12f70f>] domain_relinquish_resources+0x17f/0x1a0 (XEN) EFLAGS: 00210202 CONTEXT: hypervisor (XEN) eax: 00000001 ebx: ff1c2090 ecx: ff1c2080 edx: 00000000 (XEN) esi: ff1c2080 edi: ff1c2080 ebp: ffbf7e44 esp: ffbf7dcc (XEN) cr0: 8005003b cr4: 000026f0 cr3: 00bdcc80 cr2: 080554c8 (XEN) ds: e010 es: e010 fs: 0000 gs: 0033 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ffbf7dcc: (XEN) 00000020 ff1c2080 fffffff5 fffffff3 ff1c2080 00000000 ffbf7e44 ff10416d (XEN) ff1c2080 00000005 ff116a57 2709497e 0000012f 513c6fff 00000202 000000ff (XEN) fffffff3 b33fc518 0000007b ff1030ad ff1c2080 b33fc518 00000090 00000020 (XEN) ff1d9100 00000296 8d654ad1 ff1c2080 ff1d0080 ff1c2326 00000002 00000005 (XEN) b79a000b b79d04fc b79d952c 447c7f77 b33fc54c 46257b48 b7f116a0 0000007f (XEN) 00000000 b8b08ac8 081215a8 447c7f77 b7f116a0 b7b59d74 b33fc578 447c77d3 (XEN) b7b59d74 a5dba1ee b7ba8140 0000001f a5dba1ee 00000000 0836e4f0 448738e4 (XEN) b7ba8140 08399b54 b33fc5a8 447c77d3 08399b54 b7ba8140 a5dba1ee 448738e4 (XEN) b7ba8140 b7ba8140 43841a1c ff1d9100 00200296 00200296 43841a15 00000006 (XEN) 00000003 00000004 ffbf7fb4 ffbf7f5c ff149cfd ffbf7fb4 43841a16 00000001 (XEN) 0000f800 ff111284 ff1dd044 ffbe6900 ff1dd104 000f0003 0000000f ff1fa43c (XEN) 000002e0 00000002 ffbdc080 ffbdc080 00200296 00200296 00000004 00000033 (XEN) 00009695 0000012f 00000004 43841a17 5a2f1fe8 909090ff 90909090 c3900390 (XEN) f95f30d8 ff10f4d2 ff1dd044 53d22d15 0000012f ffbf7f03 5328e111 ffbdc080 (XEN) 0000007b 0000007b 00305000 ff19a7a4 b33fc518 357f4700 b7f49430 b33fc5e8 (XEN) 00000000 00305000 b33fc518 357f4700 b7f49430 b33fc5e8 00000000 00305000 (XEN) 00000024 000d0000 c0101487 00000061 00200282 ca1ebe94 00000069 0000007b (XEN) 0000007b 00000000 00000033 00000002 ffbdc080 (XEN) Xen call trace: (XEN) [<ff12f70f>] domain_relinquish_resources+0x17f/0x1a0 (XEN) [<ff1c2080>] get_edd+0x4/0x10 (XEN) [<ff1c2080>] get_edd+0x4/0x10 (XEN) [<ff10416d>] domain_kill+0x6d/0x160 (XEN) [<ff1c2080>] get_edd+0x4/0x10 (XEN) [<ff116a57>] add_entry+0x57/0x140 (XEN) [<ff1030ad>] do_domctl+0x10d/0xc40 (XEN) [<ff1c2080>] get_edd+0x4/0x10 (XEN) [<ff1c2080>] get_edd+0x4/0x10 (XEN) [<ff1d0080>] smp_prepare_cpus+0x2d0/0x800 (XEN) [<ff1c2326>] boot_edd_info+0x170/0x200 (XEN) [<ff149cfd>] do_general_protection+0x42d/0x1310 (XEN) [<ff111284>] csched_tick+0x154/0x5e0 (XEN) [<ff10f4d2>] page_scrub_softirq+0x132/0x160 (XEN) [<ff19a7a4>] hypercall+0x94/0x9b> > On 13/8/08 08:43, "Li, Haicheng" <haicheng.li@intel.com> wrote: > >> All, >> >> We''ve finished a round of nightly testing and bug verification on RC4 >> (#18314). There are still 5 P1 open bugs. Bug #1322 and bug #1323 >> were found by extended testing with Solaris HVM guest. >> >> New P1 bugs: >> =============================================>> 1. Xen HV crashes while booting up Indiana HVM guest >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1322 >> >> 2. Booting Nevada 81 PAE HVM may cause Xen crash. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1323 >> >> Old P1 bugs: >> =============================================>> 1. One 32e, hotplug attaching VT-d NIC to guest failed. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1316. >> >> 2. On PAE, failed to hotplug attach USB EHCI device to linux guest. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1318 >> >> 3. UHCI hotplug can not work on Montevina platform. >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1319 >> >> >> >> -- haicheng >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develBest Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 10:32, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:> 1. Indiana HVM may reboot when loading grub, serial grub shows: > (XEN) sh error: sh_remove_shadows(): can''t find all shadows of mfn > 1e4490 (shadow_flags=00000040)At least it''s not a hv crash. At this late stage I could perhaps live with this.> 2. "xm destroy" Indiana HVM may cause xen call trace. But there is no > xen crash.Both backtraces are from 3.3.0-rc1. You modified and tested the wrong tree. :-) This second backtrace can no longer happen. We''re going to dig into the OOS bug a bit and decide what to do... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday, August 13, 2008 5:49 PM Keir Fraser wrote:> On 13/8/08 10:32, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: > >> 1. Indiana HVM may reboot when loading grub, serial grub shows: >> (XEN) sh error: sh_remove_shadows(): can''t find all shadows of mfn >> 1e4490 (shadow_flags=00000040) > > At least it''s not a hv crash. At this late stage I could > perhaps live with > this. > >> 2. "xm destroy" Indiana HVM may cause xen call trace. But there is no >> xen crash. > > Both backtraces are from 3.3.0-rc1. You modified and tested > the wrong tree. > :-) This second backtrace can no longer happen. > > We''re going to dig into the OOS bug a bit and decide what to do...Oh, Sorry, I made a mistake. Thanks for your pointing out. I will try rc4 to see if any issue still exists after this modification and send you the update. Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 13:25, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:>> Both backtraces are from 3.3.0-rc1. You modified and tested >> the wrong tree. >> :-) This second backtrace can no longer happen. >> >> We''re going to dig into the OOS bug a bit and decide what to do... > > Oh, Sorry, I made a mistake. Thanks for your pointing out. > I will try rc4 to see if any issue still exists after this modification > and send you the update.As of c/s 18326 I''ve not been able to reproduce the hypervisor crash in around 40 attempts. Could you give that a go (don''t remove OOS from SHADOW_OPTIMIZATIONS -- just test the tree as it is)? Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 15:48, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Oh, Sorry, I made a mistake. Thanks for your pointing out. >> I will try rc4 to see if any issue still exists after this modification >> and send you the update. > > As of c/s 18326 I''ve not been able to reproduce the hypervisor crash in > around 40 attempts. Could you give that a go (don''t remove OOS from > SHADOW_OPTIMIZATIONS -- just test the tree as it is)?If this works okay for you then I''ll make a fifth release candidate tomorrow, and plan to release early next week. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday, August 13, 2008 10:48 PM xen-devel-bounces@lists.xensource.com wrote:> As of c/s 18326 I''ve not been able to reproduce the hypervisor crash > in around 40 attempts. Could you give that a go (don''t remove OOS from > SHADOW_OPTIMIZATIONS -- just test the tree as it is)?We tried c/s 18326, the two issues still exist. We found it is more easy to reproduce these issues on 32pae host than on 32e host. See following log: If remove OOS, the two issues will disapear and no other error found. Booting Indiana(2008.05) cause Xen Crash: ################### (XEN) sh error: sh_remove_shadows(): can''t find all shadows of mfn 2291c6 (shadow_flags=60000010) (XEN) domain_crash called from common.c:2714 (XEN) Domain 2 (vcpu#1) crashed on cpu#1: (XEN) ----[ Xen-3.3.0-rc5-pre x86_32p debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) EIP: 0158:[<fe832375>] (XEN) EFLAGS: 00010202 CONTEXT: hvm guest (XEN) eax: fe832381 ebx: d996dc40 ecx: 00000004 edx: d826c800 (XEN) esi: dc285114 edi: da67511c ebp: d996daec esp: d996dae0 (XEN) cr0: 8005003b cr4: 000006b8 cr3: 023cb040 cr2: 08047e54 (XEN) ds: 0160 es: 0160 fs: 0000 gs: 01b0 ss: 0160 cs: 0158 (XEN) sh error: oos_snapshot_lookup(): gmfn 2291c6 was OOS but not in hash table (XEN) Xen BUG at common.c:817 (XEN) ----[ Xen-3.3.0-rc5-pre x86_32p debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) EIP: e008:[<ff18a7bc>] oos_snapshot_lookup+0xcc/0xf0 (XEN) EFLAGS: 00010286 CONTEXT: hypervisor (XEN) eax: 00000000 ebx: 00000000 ecx: 0000000a edx: 00000000 (XEN) esi: ffbcf034 edi: 002291c6 ebp: 00000008 esp: ff2abd90 (XEN) cr0: 80050033 cr4: 000026f0 cr3: 00bced20 cr2: 08047e54 (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff2abd90: (XEN) ff1b5a68 ff1a4d78 002291c6 00000002 00000000 00000000 ffbcf040 fe6e1428 (XEN) ffbce080 fdff3708 ffbd0080 ff197b6b ffbce080 002291c6 0022a235 00000001 (XEN) 00000001 ff2abe50 ff2abfb4 ff167149 ffbce080 ffbd0080 033a9d58 00000054 (XEN) 0000002c 00227139 0022a235 00000002 00000000 00000428 00000708 00000000 (XEN) fefe2238 00000020 fe6a4240 000000f8 ffbd0080 000dc285 2a0bf001 00000002 (XEN) fe040338 00000020 ffbd0080 ff13acd3 001e9e65 ff2abfb4 00000020 00000020 (XEN) ff2abfb4 00000020 00000020 ff190001 00b09089 08eb0000 b90843ff ffffffff (XEN) feade1e5 00000010 0c9b0158 ffffffff 00000000 00000000 0c930160 ffffffff (XEN) 00000000 00000000 0c930160 ffffffff 00000000 00000000 00000000 ff2abfb4 (XEN) ffbce080 00000000 ffbd0080 ff194f79 00000000 00000000 0022a03f 00000001 (XEN) 00000001 00003708 f6800000 001e9e65 ffffffff 00000000 dc285114 06062001 (XEN) 00000000 025c6027 00000000 01539361 80000000 001e9262 002291c6 000aa289 (XEN) 2a235067 00000002 27139021 80000002 ff17e469 00000001 ffbce080 dc285114 (XEN) 00000000 ffbce080 ff2abfb4 ff1839eb ffbce080 dc285114 ff2abfb4 c8589e63 (XEN) c8d856e1 000000d6 c85961b3 000000d6 ffbce080 ffbd0080 00000003 0000d900 (XEN) 0000e002 c8d856e1 000000d6 00000000 ff2abfb4 ff1dc180 ff1de100 ff2abfb4 (XEN) 00000001 ff2abfb4 ffbce080 ffbce080 dc285114 ff2abfd4 d996daec ff17e2d9 (XEN) ff2abfb4 d996dc40 00000004 d826c800 dc285114 da67511c d996daec fe832381 (XEN) 00f00001 fe832375 00000000 00010202 d996dae0 00000000 00000000 00000000 (XEN) 00000000 00000000 00000001 ffbce080 (XEN) Xen call trace: (XEN) [<ff18a7bc>] oos_snapshot_lookup+0xcc/0xf0 (XEN) [<ff197b6b>] sh_page_fault__guest_3+0x117b/0x1420 (XEN) [<ff167149>] hvmemul_get_seg_reg+0x49/0x60 (XEN) [<ff13acd3>] put_page_from_l1e+0x63/0xf0 (XEN) [<ff190001>] shadow_set_l2e+0x341/0x400 (XEN) [<ff194f79>] sh_invlpg__guest_3+0x2c9/0x2e0 (XEN) [<ff17e469>] vmx_intr_assist+0x89/0x380 (XEN) [<ff1839eb>] vmx_vmexit_handler+0x63b/0x1230 (XEN) [<ff17e2d9>] vmx_asm_vmexit_handler+0x49/0x4c (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Xen BUG at common.c:817 (XEN) **************************************** ################ Destroy Nevada 81 HVM casue xen crash: ################ (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b0 (XEN) mm.c:706:d1 Error getting mfn 2235b0 (pfn 23b0) from L1 entry 00000002235b0023 for dom1 (XEN) mm.c:1941:d1 Type count overflow on pfn 2235b(XEN) (XEN) **************************************** (XEN) Panic on CPU 7: (XEN) Xen BUG at page_alloc.c:839 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... ####################### Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/8/08 08:09, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:>> As of c/s 18326 I''ve not been able to reproduce the hypervisor crash >> in around 40 attempts. Could you give that a go (don''t remove OOS from >> SHADOW_OPTIMIZATIONS -- just test the tree as it is)? > > We tried c/s 18326, the two issues still exist. We found it is more easy > to reproduce these issues on 32pae host than on 32e host. See following > log: > If remove OOS, the two issues will disapear and no other error found.Thanks Jiajun, That''s disappointing. :-( I think the second of your crashes is the one that I''ve been able to reproduce (but infrequently -- maybe one time in 100). The symptoms are a bit different for me since I run a debug build and crash earlier, well before domain destruction. There''s a chance that Gianluca''s new patch will fix your first host crash (although the domain crash would probably still remain). We still need to decide whether to fix the second issue or disable OOS. We''re not decided on that just yet. If it reproed more reliably for us then I''d be more optimistic about fixing it. Perhaps I will switch to 32pae as so far I''ve been running 32e host. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/8/08 08:26, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> That''s disappointing. :-( I think the second of your crashes is the one that > I''ve been able to reproduce (but infrequently -- maybe one time in 100). The > symptoms are a bit different for me since I run a debug build and crash > earlier, well before domain destruction.Here''s my crash: (XEN) Assertion ''(x & ((1U<<26)-1)) != 0'' failed at mm.c:1891 (XEN) ----[ Xen-3.3.0-rc5-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff828c80150ae2>] put_page_type+0x39/0x13c (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 00000000e8000000 rbx: 00000000e8000000 rcx: 0000000000000000 (XEN) rdx: ffff83003e1e6100 rsi: ffff83003e1e6100 rdi: ffff828400563808 (XEN) rbp: ffff83003e1f7ae8 rsp: ffff83003e1f7ab8 r8: 000000003e1e6100 (XEN) r9: ffff83003e1e6100 r10: 0000000000000000 r11: 80000000227cd063 (XEN) r12: ffff828400563808 r13: 00000000e7ffffff r14: ffff828400563820 (XEN) r15: ffff828400563820 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 000000003e4e4000 cr2: 0000000008047ff4 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83003e1f7ab8: (XEN) ffff83003e1f7b18 ffff828400563808 80000000227cd063 ffff83003e1e6100 (XEN) ffff83003e1e6100 000000000003efd6 ffff83003e1f7b18 ffff828c8014da74 (XEN) ffff83003efd6078 ffff83003e1f7b50 ffff83003efd6070 0000000000000001 (XEN) ffff83003e1f7b78 ffff828c801bbc80 80000000227cd063 ffff83003e1e6100 (XEN) 000000013e1f7b68 000000000003efd6 80000000227cd061 0000000000000000 (XEN) ffff83003d2edb38 0000000000000000 00000000000227cd ffff83003d2ec100 (XEN) ffff83003e1f7b88 ffff828c801c0b74 ffff83003e1f7b98 ffff828c801ac765 (XEN) ffff83003e1f7be8 ffff828c801a828a 00000000000227cd ffff83003d2ec100 (XEN) 000000003e1e6100 00000000000227cd ffff83003d2ec100 ffff828400563808 (XEN) 000000000003e4e1 0000000000000000 ffff83003e1f7c18 ffff828c801a8465 (XEN) 0000000000000000 ffff83003d2edb08 ffff8140c0003520 0000000000000002 (XEN) ffff83003e1f7c38 ffff828c801a89ef ffff8140c0003520 000000000003efe2 (XEN) ffff83003e1f7c98 ffff828c801bb496 000000000000a9cd 000000003efe2520 (XEN) 000000003e1f7c98 ffff83003d210100 000000003efc5067 ffff83003d210100 (XEN) ffff83003e1f7e28 ffff8140c0003520 0000000000000002 ffff83003e1f7d28 (XEN) ffff83003e1f7ce8 ffff828c801bc309 000000010f6a3000 000000003efc5067 (XEN) 000000000003efe2 ffff83003e1f7e28 ffff83003d210100 ffff83003e1e6100 (XEN) ffff83003e1f7e28 ffff83003d210100 ffff83003e1f7e98 ffff828c801be326 (XEN) ffff83003e1f7d68 00000000d4990be8 0000000200000000 000000000000f67e (XEN) ffff83003e1f7f28 00000000d4990be8 000000000003efc5 0000000100000206 (XEN) Xen call trace: (XEN) [<ffff828c80150ae2>] put_page_type+0x39/0x13c (XEN) [<ffff828c8014da74>] put_page_from_l1e+0x102/0x16b (XEN) [<ffff828c801bbc80>] shadow_set_l1e+0x53f/0x551 (XEN) [<ffff828c801c0b74>] sh_rm_write_access_from_sl1p__guest_3+0xd2/0xfd (XEN) [<ffff828c801ac765>] sh_remove_write_access_from_sl1p+0x8d/0xaf (XEN) [<ffff828c801a828a>] oos_remove_write_access+0x5a/0xec (XEN) [<ffff828c801a8465>] _sh_resync+0x149/0x20f (XEN) [<ffff828c801a89ef>] sh_resync+0x97/0xd9 (XEN) [<ffff828c801bb496>] shadow_set_l2e+0x1fe/0x4a9 (XEN) [<ffff828c801bc309>] shadow_get_and_create_l1e+0x1b3/0x244 (XEN) [<ffff828c801be326>] sh_page_fault__guest_3+0x9ee/0x1404 (XEN) [<ffff828c801a24ff>] vmx_vmexit_handler+0x2e5/0x841 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion ''(x & ((1U<<26)-1)) != 0'' failed at mm.c:1891 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday, August 14, 2008 3:27 PM Keir Fraser wrote:> Thanks Jiajun, > > That''s disappointing. :-( I think the second of your crashes > is the one that > I''ve been able to reproduce (but infrequently -- maybe one > time in 100). The > symptoms are a bit different for me since I run a debug build and > crash earlier, well before domain destruction. > > There''s a chance that Gianluca''s new patch will fix your first > host crash > (although the domain crash would probably still remain).Yes. We tried the patch, still got xen crash. ######### (XEN) sh error: sh_remove_write_access(): can''t remove write access to mfn 2291b5: guest has 67108863 special-use mappings of it (XEN) domain_crash called from common.c:2396 (XEN) Domain 1 (vcpu#1) crashed on cpu#0: (XEN) ----[ Xen-3.3.0-rc5-pre x86_32p debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) EIP: 0158:[<fe817811>] (XEN) EFLAGS: 00010282 CONTEXT: hvm guest (XEN) eax: da0cd300 ebx: 00000000 ecx: 80000000 edx: 00000000 (XEN) esi: 00000000 edi: 00000000 ebp: d8213bb4 esp: d8213bb0 (XEN) cr0: 8005003b cr4: 000006b8 cr3: 023cb040 cr2: 08046fdc (XEN) ds: 0160 es: 0160 fs: 0000 gs: 01b0 ss: 0160 cs: 0158 (XEN) Xen BUG at page_alloc.c:839 (XEN) ----[ Xen-3.3.0-rc5-pre x86_32p debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) EIP: e008:[<ff10f81d>] free_domheap_pages+0x9d/0x250 (XEN) EFLAGS: 00210206 CONTEXT: hypervisor (XEN) eax: 00000002 ebx: 00000000 ecx: f9bda8f8 edx: 00000002 (XEN) esi: 00000001 edi: ff1c4080 ebp: f9bda8f8 esp: ffbf3d5c (XEN) cr0: 8005003b cr4: 000026f0 cr3: 00bd8d20 cr2: 082bfef0 (XEN) ds: e010 es: e010 fs: 0000 gs: 0033 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ffbf3d5c: (XEN) ff10ef33 00000015 00000000 f9bda910 f9bda8f8 00000001 ff1c5504 ff1395ac (XEN) f9bda8f8 00000000 ffbf3e44 f9bda8f8 f9bda910 f9bda8f8 68000000 ff12fc57 (XEN) f9bda8f8 ff1c4080 ffbf3e44 ff186515 60000000 ff1c4090 ff1c4080 ff1c4090 (XEN) ff1c4080 ff1c4080 ffbf3e44 ff12ff2c ffbca080 00000200 ffbf3e44 fffffff3 (XEN) ff1c4080 00000000 ffbf3e44 ff104171 ff1c4080 00a261a4 00000000 15901a37 (XEN) 00000001 0b200494 0000000b fffffff3 fffffff3 b34f62d8 0000007b ff1030ad (XEN) ff1c4080 b34f62d8 00000090 ff13c005 f9880070 e0000000 00000020 ff1c4080 (XEN) ff13ad7b f95b2908 00000002 00000005 b7a10001 b7a42554 b7a4b4dc 447c7f77 (XEN) b34f630c 46257b48 b7f836a0 0000007f 00000000 b8b08ac8 08121598 447c7f77 (XEN) b7f836a0 b7bcbd74 b34f6338 447c77d3 b7bcbd74 a5dba1ee b7c291e0 0000001f (XEN) a5dba1ee 00000000 0836b320 448738e4 b7c291e0 b5573824 b34f6368 447c77d3 (XEN) b5573824 b7c291e0 a5dba1ee 448738e4 b7c291e0 b7c291e0 43841a1c 000000fb (XEN) 000000fb 0000005c 43841a15 ffbd4080 00000003 00000004 ffbf3fb4 ffbf3f5c (XEN) ff14a64d ffbf3fb4 43841a16 00000001 0000f800 00000004 fed1f030 00000030 (XEN) ff1f6080 ffbd4080 ff1f6080 00000000 0000005d 00000002 ffbd8080 ffbd8080 (XEN) b587215b 0000005d 00000004 00000033 0000d3a0 b5b50665 00000004 43841a17 (XEN) 5a9d8fe8 909090ff 90909090 c3900390 ff1f9a00 ff116bbc b5b50665 0000005d (XEN) ffbf3fb4 ff1e1003 ff1dd180 ffbd8080 0000007b 0000007b 00305000 ff19d614 (XEN) b34f62d8 52ab9700 b7fbb430 b34f63a8 00000000 00305000 b34f62d8 52ab9700 (XEN) b7fbb430 b34f63a8 00000000 00305000 00000024 000d0000 c0101487 00000061 (XEN) Xen call trace: (XEN) [<ff10f81d>] free_domheap_pages+0x9d/0x250 (XEN) [<ff10ef33>] free_heap_pages+0xc3/0x1d0 (XEN) [<ff1c5504>] nokey+0xc/0x10 (XEN) [<ff1395ac>] put_page+0x5c/0x60 (XEN) [<ff12fc57>] relinquish_memory+0xe7/0x290 (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff186515>] paging_log_dirty_teardown+0x55/0xa0 (XEN) [<ff1c4090>] __start+0x25/0x1ef (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff1c4090>] __start+0x25/0x1ef (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff12ff2c>] domain_relinquish_resources+0x12c/0x190 (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff104171>] domain_kill+0x71/0x160 (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff1030ad>] do_domctl+0x10d/0xc40 (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff13c005>] get_page_from_l1e+0x1b5/0x480 (XEN) [<ff1c4080>] __start+0x15/0x1ef (XEN) [<ff13ad7b>] put_page_from_l1e+0x9b/0xf0 (XEN) [<ff14a64d>] do_general_protection+0x42d/0x1310 (XEN) [<ff116bbc>] timer_softirq_action+0x10c/0x130 (XEN) [<ff19d614>] hypercall+0x94/0x9b (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) Xen BUG at page_alloc.c:839 (XEN) **************************************** ######### Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/8/08 10:03, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:>> There''s a chance that Gianluca''s new patch will fix your first >> host crash >> (although the domain crash would probably still remain). > > Yes. We tried the patch, still got xen crash.This could be a variant of the second crash (screwed reference counts) though. I''ll take Gianluca''s patch since it probably does make things better, but clearly we still have a nasty refcounting bug, probably in the OOS code. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/8/08 08:09, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:> We tried c/s 18326, the two issues still exist. We found it is more easy > to reproduce these issues on 32pae host than on 32e host. See following > log: > If remove OOS, the two issues will disapear and no other error found.So, without OOS you didn''t see any issues (not even domain crash)? That''s promising. With OOS, how easily do you reproduce the host crash? It takes me 50-100 guest boots to cause a crash right now. Presumably you can repro more quickly than that? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday, August 14, 2008 6:14 PM Keir Fraser wrote:> On 14/8/08 08:09, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: > >> We tried c/s 18326, the two issues still exist. We found it is more >> easy to reproduce these issues on 32pae host than on 32e host. See >> following log: If remove OOS, the two issues will disapear and no >> other error found. > > So, without OOS you didn''t see any issues (not even domain > crash)? That''s > promising.No. I didn''t see any issues without OOS.> With OOS, how easily do you reproduce the host crash? It takes > me 50-100 > guest boots to cause a crash right now. Presumably you can repro more > quickly than that?It is very easy to reproduce the crash on our machine. About 2~3 times trying will meet a crash. I attach my config file, maybe there is some difference between our environments. Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday, August 14, 2008 8:49 PM xen-devel-bounces@lists.xensource.com wrote:>> With OOS, how easily do you reproduce the host crash? It takes >> me 50-100 >> guest boots to cause a crash right now. Presumably you can repro more >> quickly than that? > > It is very easy to reproduce the crash on our machine. About 2~3 > times trying will meet a crash. I attach my config file, maybe there > is some difference between our environments.And the two issues mostly happen after guest loading kernel and begin to start system services. We didn''t meet crash at the beginning of creating guest. Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Aug-14 15:14 UTC
[Xen-devel] [PATCH] Fix OOS typecounting [was: Test report for Xen-3.3.0-rc4 (#18314)]
Hello, Keir Fraser wrote:> We still need to decide whether to fix the second issue or disable OOS.The attached patch should fix this issue. It was an all-my-fault breakage of set_l1e atomicity.> We''re not decided on that just yet. If it reproed more reliably for us then > I''d be more optimistic about fixing it. Perhaps I will switch to 32pae as so > far I''ve been running 32e host.A very easy way to reproduce this bug is to set SHADOW_OOS_FIXUPS to 1 in xen/include/asm-x86/mm.h. This will reproduce very quickly the typecount corruption. Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/8/08 10:07, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>>> There''s a chance that Gianluca''s new patch will fix your first >>> host crash >>> (although the domain crash would probably still remain). >> >> Yes. We tried the patch, still got xen crash. > > This could be a variant of the second crash (screwed reference counts) > though. I''ll take Gianluca''s patch since it probably does make things > better, but clearly we still have a nasty refcounting bug, probably in the > OOS code.It''s believed fixed by changeset 18331. Please can you test this? If it works okay for you then we''ll make a new release candidate tomorrow and plan to release early next week. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday, August 14, 2008 11:33 PM Keir Fraser wrote:> It''s believed fixed by changeset 18331. Please can you test this? > > If it works okay for you then we''ll make a new release > candidate tomorrow > and plan to release early next week.Yes. Both Indiana and Nevada can boot well on c/s 18331. We didn''t meet crash or any error massage on the c/s. Best Regards Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel