Xu, Jiajun
2010-May-24 15:51 UTC
[Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
Hi all, This is our bi-weekly test report for Xen-unstable tree. There are 3 new bugs found in this two weeks. 64b Testing is blocked because guest creation on 64b host cause Xen panic. XenU and CPU offline can not work. For bug fixing, Save/Restore and Live Migration can work again. The VT-d issues with 2 NICs and Myriom NIC are both resolved. We use Pv_ops(xen/master, 2.6.31.13) as Dom0 in our testing. Status Summary ===================================================================Feature Result ------------------------------------------------------ VT-x/VT-x2 PASS RAS Buggy VT-d Buggy SR-IOV Buggy TXT PASS PowerMgmt PASS Other Buggy New Bugs (3): ===================================================================1. xen hypervisor hang when create guest on 32e platform http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 2. CPU panic when running cpu offline http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 3. xenu guest can''t boot up http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1618 Fixed Bugs (3) ===================================================================1. Save/Restore and Live Migration can not work http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1609 2. [VT-d]Xen crash when booting guest with device assigned and Myricom driver loaded in Dom0 http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1608 3. [VT-d] Guest with 2 NIC assigned may hang when booting http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1610 Old P1 Bugs (1): ====================================================================1. stubdom based guest hangs at starting when using qcow image. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1372 Old P2 Bugs (12) ====================================================================1. Failed to install FC10 http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1461 2. Two onboard 82576 NICs assigned to HVM guest cannot work stable if use INTx interrupt. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1459 3. stubdom based guest hangs when assigning hdc to it. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1373 4. [stubdom]The xm save command hangs while saving <Domain-dm>. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1377 5. [stubdom] cannot restore stubdom based domain. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1378 6. Live Migration with md5sum running cause dma_timer_expiry error in guest http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1530 7. Very slow mouse/keyboard and no USB thumbdrive detected w/Core i7 & Pvops http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1541 8. Linux guest boots up very slow with SDL rendering http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1478 9. [RAS] CPUs are not in the correct NUMA node after hot-add memory http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1573 10. [SR-IOV] Qemu report pci_msix_writel error while assigning VF to guest http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1575 11. Can''t create guest with big memory if do not limit Dom0 memory http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1604 12. Add fix for TBOOT/Xen and S3 flow http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1611 Xen Info: ===========================================================================Service OS : Red Hat Enterprise Linux Server release 5.1 (Tikanga) xen-changeset: 21438:840f269d95fb pvops git: commit a3e7c7b82c09450487a7e7f5f47b165c49474fd4 Merge: f3d5fe8... a47d360... Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> ioemu git: commit 01626771cf2e9285fbfddcbded2820fc77745e4b Author: Ian Jackson <ian.jackson@eu.citrix.com> Date: Fri Apr 30 17:41:45 2010 +0100 Test Environment: =========================================================================Service OS : Red Hat Enterprise Linux Server release 5.1 (Tikanga) Hardware : Westmere-HEDT PAE Summary Test Report of Last Session ==================================================================== Total Pass Fail NoResult Crash ====================================================================vtd_ept_vpid 13 13 0 0 0 control_panel_ept_vpid 10 8 2 0 0 ras_ept_vpid 1 0 0 0 1 gtest_ept_vpid 23 23 0 0 0 acpi_ept_vpid 5 3 2 0 0 sriov_ept_vpid 2 2 0 0 0 ====================================================================vtd_ept_vpid 13 13 0 0 0 :lm_pci_up_nomsi_PAE_gPA 1 1 0 0 0 :two_dev_scp_nomsi_PAE_g 1 1 0 0 0 :one_pcie_smp_PAE_gPAE 1 1 0 0 0 :two_dev_up_PAE_gPAE 1 1 0 0 0 :lm_pcie_smp_PAE_gPAE 1 1 0 0 0 :one_pcie_smp_nomsi_PAE_ 1 1 0 0 0 :two_dev_smp_nomsi_PAE_g 1 1 0 0 0 :two_dev_smp_PAE_gPAE 1 1 0 0 0 :two_dev_up_nomsi_PAE_gP 1 1 0 0 0 :hp_pci_up_PAE_gPAE 1 1 0 0 0 :two_dev_scp_PAE_gPAE 1 1 0 0 0 :lm_pci_smp_nomsi_PAE_gP 1 1 0 0 0 :lm_pcie_up_PAE_gPAE 1 1 0 0 0 control_panel_ept_vpid 10 8 2 0 0 :XEN_SR_SMP_PAE_gPAE 1 1 0 0 0 :XEN_linux_win_PAE_gPAE 1 1 0 0 0 :XEN_SR_Continuity_PAE_g 1 1 0 0 0 :XEN_LM_SMP_PAE_gPAE 1 1 0 0 0 :XEN_vmx_vcpu_pin_PAE_gP 1 1 0 0 0 :XEN_1500M_guest_PAE_gPA 1 0 1 0 0 :XEN_LM_Continuity_PAE_g 1 1 0 0 0 :XEN_two_winxp_PAE_gPAE 1 0 1 0 0 :XEN_256M_guest_PAE_gPAE 1 1 0 0 0 :XEN_vmx_2vcpu_PAE_gPAE 1 1 0 0 0 ras_ept_vpid 1 0 0 0 1 :cpu_online_offline_PAE_ 1 0 0 0 1 gtest_ept_vpid 23 23 0 0 0 :ltp_nightly_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_PAE_gPAE 1 1 0 0 0 :reboot_xp_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_xp_PAE_gPA 1 1 0 0 0 :boot_fc9_PAE_gPAE 1 1 0 0 0 :boot_up_vista_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_win2k3_PAE 1 1 0 0 0 :boot_smp_win7_ent_PAE_g 1 1 0 0 0 :boot_smp_acpi_win2k3_PA 1 1 0 0 0 :boot_smp_acpi_xp_PAE_gP 1 1 0 0 0 :boot_smp_win7_ent_debug 1 1 0 0 0 :boot_smp_vista_PAE_gPAE 1 1 0 0 0 :boot_up_noacpi_win2k3_P 1 1 0 0 0 :boot_nevada_PAE_gPAE 1 1 0 0 0 :boot_solaris10u5_PAE_gP 1 1 0 0 0 :boot_indiana_PAE_gPAE 1 1 0 0 0 :boot_rhel5u1_PAE_gPAE 1 1 0 0 0 :boot_base_kernel_PAE_gP 1 1 0 0 0 :boot_up_win2008_PAE_gPA 1 1 0 0 0 :boot_up_noacpi_xp_PAE_g 1 1 0 0 0 :boot_smp_win2008_PAE_gP 1 1 0 0 0 :reboot_fc6_PAE_gPAE 1 1 0 0 0 :kb_nightly_PAE_gPAE 1 1 0 0 0 acpi_ept_vpid 5 3 2 0 0 :monitor_p_status_PAE_gP 1 1 0 0 0 :hvm_s3_smp_sr_PAE_gPAE 1 0 1 0 0 :Dom0_S3_PAE_gPAE 1 1 0 0 0 :monitor_c_status_PAE_gP 1 1 0 0 0 :hvm_s3_smp_PAE_gPAE 1 0 1 0 0 sriov_ept_vpid 2 2 0 0 0 :serial_vfs_smp_PAE_gPAE 1 1 0 0 0 :one_vf_smp_PAE_gPAE 1 1 0 0 0 ====================================================================Total 54 49 4 0 1 Service OS : Red Hat Enterprise Linux Server release 5.1 (Tikanga) Hardware : Stoakley PAE Summary Test Report of Last Session ==================================================================== Total Pass Fail NoResult Crash ====================================================================vtd_ept_vpid 13 12 1 0 0 control_panel_ept_vpid 12 9 3 0 0 ras_ept_vpid 1 0 0 0 1 gtest_ept_vpid 23 23 0 0 0 acpi_ept_vpid 3 3 0 0 0 ====================================================================vtd_ept_vpid 13 12 1 0 0 :two_dev_scp_nomsi_PAE_g 1 1 0 0 0 :lm_pci_up_nomsi_PAE_gPA 1 1 0 0 0 :one_pcie_smp_PAE_gPAE 1 1 0 0 0 :two_dev_up_PAE_gPAE 1 1 0 0 0 :lm_pcie_smp_PAE_gPAE 1 1 0 0 0 :two_dev_smp_nomsi_PAE_g 1 1 0 0 0 :two_dev_smp_PAE_gPAE 1 0 1 0 0 :one_pcie_smp_nomsi_PAE_ 1 1 0 0 0 :hp_pci_up_PAE_gPAE 1 1 0 0 0 :two_dev_up_nomsi_PAE_gP 1 1 0 0 0 :two_dev_scp_PAE_gPAE 1 1 0 0 0 :lm_pci_smp_nomsi_PAE_gP 1 1 0 0 0 :lm_pcie_up_PAE_gPAE 1 1 0 0 0 control_panel_ept_vpid 12 9 3 0 0 :XEN_4G_guest_PAE_gPAE 1 0 1 0 0 :XEN_linux_win_PAE_gPAE 1 1 0 0 0 :XEN_SR_SMP_PAE_gPAE 1 1 0 0 0 :XEN_LM_SMP_PAE_gPAE 1 1 0 0 0 :XEN_SR_Continuity_PAE_g 1 1 0 0 0 :XEN_vmx_vcpu_pin_PAE_gP 1 1 0 0 0 :XEN_LM_Continuity_PAE_g 1 1 0 0 0 :XEN_256M_guest_PAE_gPAE 1 1 0 0 0 :XEN_1500M_guest_PAE_gPA 1 0 1 0 0 :XEN_256M_xenu_PAE_gPAE 1 0 1 0 0 :XEN_two_winxp_PAE_gPAE 1 1 0 0 0 :XEN_vmx_2vcpu_PAE_gPAE 1 1 0 0 0 ras_ept_vpid 1 0 0 0 1 :cpu_online_offline_PAE_ 1 0 0 0 1 gtest_ept_vpid 23 23 0 0 0 :ltp_nightly_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_PAE_gPAE 1 1 0 0 0 :reboot_xp_PAE_gPAE 1 1 0 0 0 :boot_up_acpi_xp_PAE_gPA 1 1 0 0 0 :boot_up_vista_PAE_gPAE 1 1 0 0 0 :boot_fc9_PAE_gPAE 1 1 0 0 0 :boot_smp_win7_ent_PAE_g 1 1 0 0 0 :boot_up_acpi_win2k3_PAE 1 1 0 0 0 :boot_smp_acpi_win2k3_PA 1 1 0 0 0 :boot_smp_acpi_xp_PAE_gP 1 1 0 0 0 :boot_smp_win7_ent_debug 1 1 0 0 0 :boot_smp_vista_PAE_gPAE 1 1 0 0 0 :boot_up_noacpi_win2k3_P 1 1 0 0 0 :boot_nevada_PAE_gPAE 1 1 0 0 0 :boot_rhel5u1_PAE_gPAE 1 1 0 0 0 :boot_indiana_PAE_gPAE 1 1 0 0 0 :boot_solaris10u5_PAE_gP 1 1 0 0 0 :boot_base_kernel_PAE_gP 1 1 0 0 0 :boot_up_win2008_PAE_gPA 1 1 0 0 0 :boot_up_noacpi_xp_PAE_g 1 1 0 0 0 :boot_smp_win2008_PAE_gP 1 1 0 0 0 :reboot_fc6_PAE_gPAE 1 1 0 0 0 :kb_nightly_PAE_gPAE 1 1 0 0 0 acpi_ept_vpid 3 3 0 0 0 :Dom0_S3_PAE_gPAE 1 1 0 0 0 :hvm_s3_smp_sr_PAE_gPAE 1 1 0 0 0 :hvm_s3_smp_PAE_gPAE 1 1 0 0 0 ====================================================================Total 52 47 4 0 1 Best Regards, Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-24 16:53 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 24/05/2010 16:51, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:> 1. xen hypervisor hang when create guest on 32e platform > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 > 2. CPU panic when running cpu offline > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616Please attach the backtrace. Also some indication of how easily this bug triggers (is it on every cpu offline on your system, or do you have to cycle the test a while?).> 3. xenu guest can''t boot up > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1618This is probably fixed at xen-unstable tip. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Jiajun
2010-May-25 08:17 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
> On 24/05/2010 16:51, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: > >> 1. xen hypervisor hang when create guest on 32e platform >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617The bug occurs each time when I created the guest. I have attached the serial output on the bugzilla.>> 2. CPU panic when running cpu offline >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616Xen will panic when I offline cpu each time. The log is also attached on the bugzilla.> Please attach the backtrace. Also some indication of how easily this > bug triggers (is it on every cpu offline on your system, or do you > have to cycle the test a while?). > >> 3. xenu guest can''t boot up >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1618 > > This is probably fixed at xen-unstable tip.Thanks a lot. We will verify the issue with the tip. Best Regards, Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dulloor
2010-May-25 08:27 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On Tue, May 25, 2010 at 4:17 AM, Xu, Jiajun <jiajun.xu@intel.com> wrote:>> On 24/05/2010 16:51, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: >> >>> 1. xen hypervisor hang when create guest on 32e platform >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 > > The bug occurs each time when I created the guest. I have attached the serial output on the bugzilla.I see the same hang, but on a 64-bit platform. Can you verify if 21433 is the culprit, which is the case with me.> >>> 2. CPU panic when running cpu offline >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 > > Xen will panic when I offline cpu each time. The log is also attached on the bugzilla. > >> Please attach the backtrace. Also some indication of how easily this >> bug triggers (is it on every cpu offline on your system, or do you >> have to cycle the test a while?). >> >>> 3. xenu guest can''t boot up >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1618 >> >> This is probably fixed at xen-unstable tip. > > Thanks a lot. We will verify the issue with the tip. > > Best Regards, > Jiajun > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-dulloor _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Jiajun
2010-May-25 08:29 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
> On Tue, May 25, 2010 at 4:17 AM, Xu, Jiajun <jiajun.xu@intel.com> wrote: >>> On 24/05/2010 16:51, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: >>> >>>> 1. xen hypervisor hang when create guest on 32e platform >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >> >> The bug occurs each time when I created the guest. I have attached >> the > serial output on the bugzilla. > I see the same hang, but on a 64-bit platform. Can you verify if 21433 > is the culprit, which is the case with me.Yes. We also see it with 64-bit host. :) Thanks, Dulloor. We will have a check with 21433. Best Regards, Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-25 09:13 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 25/05/2010 09:17, "Xu, Jiajun" <jiajun.xu@intel.com> wrote:>>> 1. xen hypervisor hang when create guest on 32e platform >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 > > The bug occurs each time when I created the guest. I have attached the serial > output on the bugzilla.I haven''t been able to reproduce this.>>> 2. CPU panic when running cpu offline >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 > > Xen will panic when I offline cpu each time. The log is also attached on the > bugzilla.Nor this. I even installed 32-bit Xen to match your environment more closely. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-25 09:15 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 25/05/2010 10:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>>>> 1. xen hypervisor hang when create guest on 32e platform >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >> >> The bug occurs each time when I created the guest. I have attached the serial >> output on the bugzilla. > > I haven''t been able to reproduce this. > >>>> 2. CPU panic when running cpu offline >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 >> >> Xen will panic when I offline cpu each time. The log is also attached on the >> bugzilla. > > Nor this. I even installed 32-bit Xen to match your environment more > closely.I''m running xen-unstable:21447 by the way. I ran 64-bit Xen for testing (1) above, and both 64-bit and 32-bit Xen for testing (2). K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Jiajun
2010-May-26 13:48 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
> On Tue, May 25, 2010 at 4:17 AM, Xu, Jiajun <jiajun.xu@intel.com> wrote: >>> On 24/05/2010 16:51, "Xu, Jiajun" <jiajun.xu@intel.com> wrote: >>> >>>> 1. xen hypervisor hang when create guest on 32e platform >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >> >> The bug occurs each time when I created the guest. I have attached >> the serial > output on the bugzilla. > I see the same hang, but on a 64-bit platform. Can you verify if 21433 > is the culprit, which is the case with me.Hi Keir, Dulloor, We checked it''s 21433 caused the issue on our platform. Best Regards, Jiajun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-01 07:43 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
For issue 2, CPU panic when running cpu offline, it should comes from the periodic_timer. When a CPU is pull down, cpu_disable_scheduler will remove the single shot timer, but the periodic_timer is not migrated. After the vcpu is scheduled on another pCPU later, and then schedule out from that new pcpu, the stop_timer(&prev->periodic_timer) will try to access the per_cpu strucutre, whic still poiting to the offlined CPU's per_cpu area and will cause trouble. This should be caused by the per_cpu changes. I try to migrate the periodic_timer also when cpu_disable_scheduler() and seems it works. (comments the migration in cpu_disable_scheudler will trigger the printk). Seems on your side, the timer will always be triggered before schedule out? --jyh diff -r 96917cf25bf3 xen/common/schedule.c --- a/xen/common/schedule.c Fri May 28 10:54:07 2010 +0100 +++ b/xen/common/schedule.c Tue Jun 01 15:35:21 2010 +0800 @@ -487,6 +487,15 @@ int cpu_disable_scheduler(unsigned int c migrate_timer(&v->singleshot_timer, cpu_mig); } +/* + if ( v->periodic_timer.cpu == cpu ) + { + int cpu_mig = first_cpu(c->cpu_valid); + if ( cpu_mig == cpu ) + cpu_mig = next_cpu(cpu_mig, c->cpu_valid); + migrate_timer(&v->periodic_timer, cpu_mig); + } +*/ if ( v->processor == cpu ) { set_bit(_VPF_migrating, &v->pause_flags); @@ -505,7 +514,10 @@ int cpu_disable_scheduler(unsigned int c * all locks. */ if ( v->processor == cpu ) + { + printk("we hit the EAGAIN here\n"); ret = -EAGAIN; + } } } return ret; @@ -1005,6 +1017,11 @@ static void schedule(void) perfc_incr(sched_ctx); + if (prev->periodic_timer.cpu != smp_processor_id() && !cpu_online(prev->periodic_timer.cpu)) + { + printk("I'm now at cpu %x, timer's cpu is %x\n", smp_processor_id(), prev->periodic_timer.cpu); + } + stop_timer(&prev->periodic_timer); /* Ensure that the domain has an up-to-date time base. */ --jyh>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >Sent: Tuesday, May 25, 2010 5:15 PM >To: Xu, Jiajun; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >#a3e7c7... > >On 25/05/2010 10:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >>>>> 1. xen hypervisor hang when create guest on 32e platform >>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >>> >>> The bug occurs each time when I created the guest. I have attached the serial >>> output on the bugzilla. >> >> I haven't been able to reproduce this. >> >>>>> 2. CPU panic when running cpu offline >>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 >>> >>> Xen will panic when I offline cpu each time. The log is also attached on the >>> bugzilla. >> >> Nor this. I even installed 32-bit Xen to match your environment more >> closely. > >I'm running xen-unstable:21447 by the way. I ran 64-bit Xen for testing (1) >above, and both 64-bit and 32-bit Xen for testing (2). > > K. > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-01 09:30 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 01/06/2010 08:43, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> For issue 2, CPU panic when running cpu offline, it should comes from the > periodic_timer. > > When a CPU is pull down, cpu_disable_scheduler will remove the single shot > timer, but the periodic_timer is not migrated. > After the vcpu is scheduled on another pCPU later, and then schedule out from > that new pcpu, the stop_timer(&prev->periodic_timer) will try to access the > per_cpu strucutre, whic still poiting to the offlined CPU''s per_cpu area and > will cause trouble. This should be caused by the per_cpu changes.Which xen-unstable changeset are you testing? All timers should be automatically migrated off a dead CPU and onto CPU0 by changeset 21424. Is that not working okay for you? -- Keir> I try to migrate the periodic_timer also when cpu_disable_scheduler() and > seems it works. (comments the migration in cpu_disable_scheudler will trigger > the printk). > Seems on your side, the timer will always be triggered before schedule out? > > --jyh > > diff -r 96917cf25bf3 xen/common/schedule.c > --- a/xen/common/schedule.c Fri May 28 10:54:07 2010 +0100 > +++ b/xen/common/schedule.c Tue Jun 01 15:35:21 2010 +0800 > @@ -487,6 +487,15 @@ int cpu_disable_scheduler(unsigned int c > migrate_timer(&v->singleshot_timer, cpu_mig); > } > > +/* > + if ( v->periodic_timer.cpu == cpu ) > + { > + int cpu_mig = first_cpu(c->cpu_valid); > + if ( cpu_mig == cpu ) > + cpu_mig = next_cpu(cpu_mig, c->cpu_valid); > + migrate_timer(&v->periodic_timer, cpu_mig); > + } > +*/ > if ( v->processor == cpu ) > { > set_bit(_VPF_migrating, &v->pause_flags); > @@ -505,7 +514,10 @@ int cpu_disable_scheduler(unsigned int c > * all locks. > */ > if ( v->processor == cpu ) > + { > + printk("we hit the EAGAIN here\n"); > ret = -EAGAIN; > + } > } > } > return ret; > @@ -1005,6 +1017,11 @@ static void schedule(void) > > perfc_incr(sched_ctx); > > + if (prev->periodic_timer.cpu != smp_processor_id() && > !cpu_online(prev->periodic_timer.cpu)) > + { > + printk("I''m now at cpu %x, timer''s cpu is %x\n", smp_processor_id(), > prev->periodic_timer.cpu); > + } > + > stop_timer(&prev->periodic_timer); > > /* Ensure that the domain has an up-to-date time base. */ > > > > --jyh > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >> Sent: Tuesday, May 25, 2010 5:15 PM >> To: Xu, Jiajun; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >> #a3e7c7... >> >> On 25/05/2010 10:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> >>>>>> 1. xen hypervisor hang when create guest on 32e platform >>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >>>> >>>> The bug occurs each time when I created the guest. I have attached the >>>> serial >>>> output on the bugzilla. >>> >>> I haven''t been able to reproduce this. >>> >>>>>> 2. CPU panic when running cpu offline >>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 >>>> >>>> Xen will panic when I offline cpu each time. The log is also attached on >>>> the >>>> bugzilla. >>> >>> Nor this. I even installed 32-bit Xen to match your environment more >>> closely. >> >> I''m running xen-unstable:21447 by the way. I ran 64-bit Xen for testing (1) >> above, and both 64-bit and 32-bit Xen for testing (2). >> >> K. >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-02 07:28 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Tuesday, June 01, 2010 5:31 PM >To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >#a3e7c7... > >On 01/06/2010 08:43, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> For issue 2, CPU panic when running cpu offline, it should comes from the >> periodic_timer. >> >> When a CPU is pull down, cpu_disable_scheduler will remove the single shot >> timer, but the periodic_timer is not migrated. >> After the vcpu is scheduled on another pCPU later, and then schedule out from >> that new pcpu, the stop_timer(&prev->periodic_timer) will try to access the >> per_cpu strucutre, whic still poiting to the offlined CPU's per_cpu area and >> will cause trouble. This should be caused by the per_cpu changes. > >Which xen-unstable changeset are you testing? All timers should be >automatically migrated off a dead CPU and onto CPU0 by changeset 21424. Is >that not working okay for you?We are testing on 21492. After more investigation, the root cause is the periodic_timer is stopped before take_cpu_down (in schedule()), so that it is not covred by 21424. When v->periodic_period ==0, next vcpu's p_timer is not updated by the schedule(), thus, later in next schedule round, it will cause trouble for stop_timer(). With following small patch, it works, but I'm not sure if this is good solution. --jyh diff -r 96917cf25bf3 xen/common/schedule.c --- a/xen/common/schedule.c Fri May 28 10:54:07 2010 +0100 +++ b/xen/common/schedule.c Wed Jun 02 15:18:56 2010 +0800 @@ -893,7 +893,10 @@ static void vcpu_periodic_timer_work(str ASSERT(!active_timer(&v->periodic_timer)); if ( v->periodic_period == 0 ) + { + v->periodic_timer.cpu = smp_processor_id(); return; + } periodic_next_event = v->periodic_last_event + v->periodic_period;> > -- Keir > >> I try to migrate the periodic_timer also when cpu_disable_scheduler() and >> seems it works. (comments the migration in cpu_disable_scheudler will trigger >> the printk). >> Seems on your side, the timer will always be triggered before schedule out? >> >> --jyh >> >> diff -r 96917cf25bf3 xen/common/schedule.c >> --- a/xen/common/schedule.c Fri May 28 10:54:07 2010 +0100 >> +++ b/xen/common/schedule.c Tue Jun 01 15:35:21 2010 +0800 >> @@ -487,6 +487,15 @@ int cpu_disable_scheduler(unsigned int c >> migrate_timer(&v->singleshot_timer, cpu_mig); >> } >> >> +/* >> + if ( v->periodic_timer.cpu == cpu ) >> + { >> + int cpu_mig = first_cpu(c->cpu_valid); >> + if ( cpu_mig == cpu ) >> + cpu_mig = next_cpu(cpu_mig, c->cpu_valid); >> + migrate_timer(&v->periodic_timer, cpu_mig); >> + } >> +*/ >> if ( v->processor == cpu ) >> { >> set_bit(_VPF_migrating, &v->pause_flags); >> @@ -505,7 +514,10 @@ int cpu_disable_scheduler(unsigned int c >> * all locks. >> */ >> if ( v->processor == cpu ) >> + { >> + printk("we hit the EAGAIN here\n"); >> ret = -EAGAIN; >> + } >> } >> } >> return ret; >> @@ -1005,6 +1017,11 @@ static void schedule(void) >> >> perfc_incr(sched_ctx); >> >> + if (prev->periodic_timer.cpu != smp_processor_id() && >> !cpu_online(prev->periodic_timer.cpu)) >> + { >> + printk("I'm now at cpu %x, timer's cpu is %x\n", smp_processor_id(), >> prev->periodic_timer.cpu); >> + } >> + >> stop_timer(&prev->periodic_timer); >> >> /* Ensure that the domain has an up-to-date time base. */ >> >> >> >> --jyh >> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Tuesday, May 25, 2010 5:15 PM >>> To: Xu, Jiajun; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >>> #a3e7c7... >>> >>> On 25/05/2010 10:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >>> >>>>>>> 1. xen hypervisor hang when create guest on 32e platform >>>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1617 >>>>> >>>>> The bug occurs each time when I created the guest. I have attached the >>>>> serial >>>>> output on the bugzilla. >>>> >>>> I haven't been able to reproduce this. >>>> >>>>>>> 2. CPU panic when running cpu offline >>>>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1616 >>>>> >>>>> Xen will panic when I offline cpu each time. The log is also attached on >>>>> the >>>>> bugzilla. >>>> >>>> Nor this. I even installed 32-bit Xen to match your environment more >>>> closely. >>> >>> I'm running xen-unstable:21447 by the way. I ran 64-bit Xen for testing (1) >>> above, and both 64-bit and 32-bit Xen for testing (2). >>> >>> K. >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 08:01 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 02/06/2010 08:28, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Which xen-unstable changeset are you testing? All timers should be >> automatically migrated off a dead CPU and onto CPU0 by changeset 21424. Is >> that not working okay for you? > > We are testing on 21492. > > After more investigation, the root cause is the periodic_timer is stopped > before take_cpu_down (in schedule()), so that it is not covred by 21424. > When v->periodic_period ==0, next vcpu''s p_timer is not updated by the > schedule(), thus, later in next schedule round, it will cause trouble for > stop_timer(). > > With following small patch, it works, but I''m not sure if this is good > solution.I forgot about inactive timers in c/s 21424. Hm, I will fix this in the timer subsystem and get back to you. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-02 08:49 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
> >I forgot about inactive timers in c/s 21424. Hm, I will fix this in the >timer subsystem and get back to you. > > -- KeirThanks! --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-02 09:24 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
BTW, I get following failure after loop cpu o*l for about 95 times. (XEN) Xen call trace: (XEN) [<ffff82c48014b3e9>] clear_page_sse2+0x9/0x30 (XEN) [<ffff82c4801b9922>] vmx_cpu_up_prepare+0x43/0x88 (XEN) [<ffff82c4801a13fa>] cpu_callback+0x4a/0x94 (XEN) [<ffff82c480112d95>] notifier_call_chain+0x68/0x84 (XEN) [<ffff82c480100e5b>] cpu_up+0x7b/0x12f (XEN) [<ffff82c480173b7d>] arch_do_sysctl+0x770/0x833 (XEN) [<ffff82c480121672>] do_sysctl+0x992/0x9ec (XEN) [<ffff82c4801fa3cf>] syscall_enter+0xef/0x149 (XEN) (XEN) Pagetable walk from ffff83022fe1d000: (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 (XEN) L2[0x17f] = 000000022ff2a063 5555555555555555 (XEN) L1[0x01d] = 000000022fe1d262 5555555555555555 I really can''t imagine how this can happen considering the vmx_alloc_vmcs() is so straight-forward. My test machine is really magic. Another fault as following: (XEN) Xen call trace: (XEN) [<ffff82c480173459>] memcpy+0x11/0x1e (XEN) [<ffff82c4801722bf>] cpu_smpboot_callback+0x207/0x235 (XEN) [<ffff82c480112d95>] notifier_call_chain+0x68/0x84 (XEN) [<ffff82c480100e5b>] cpu_up+0x7b/0x12f (XEN) [<ffff82c480173c1d>] arch_do_sysctl+0x770/0x833 (XEN) [<ffff82c480121712>] do_sysctl+0x992/0x9ec (XEN) [<ffff82c4801fa46f>] syscall_enter+0xef/0x149 (XEN) (XEN) Pagetable walk from ffff830228ce5000: (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 (XEN) L2[0x146] = 000000022fea3063 5555555555555555 (XEN) L1[0x0e5] = 0000000228ce5262 000000000001fd49 (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) FATAL PAGE FAULT (XEN) [error_code=0002] (XEN) Faulting linear address: ffff830228ce5000 (XEN) **************************************** (XEN) --jyh>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, June 02, 2010 4:01 PM >To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >#a3e7c7... > >On 02/06/2010 08:28, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Which xen-unstable changeset are you testing? All timers should be >>> automatically migrated off a dead CPU and onto CPU0 by changeset 21424. Is >>> that not working okay for you? >> >> We are testing on 21492. >> >> After more investigation, the root cause is the periodic_timer is stopped >> before take_cpu_down (in schedule()), so that it is not covred by 21424. >> When v->periodic_period ==0, next vcpu''s p_timer is not updated by the >> schedule(), thus, later in next schedule round, it will cause trouble for >> stop_timer(). >> >> With following small patch, it works, but I''m not sure if this is good >> solution. > >I forgot about inactive timers in c/s 21424. Hm, I will fix this in the >timer subsystem and get back to you. > > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 09:41 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 02/06/2010 10:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> (XEN) Pagetable walk from ffff83022fe1d000: > (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 > (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 > (XEN) L2[0x17f] = 000000022ff2a063 5555555555555555 > (XEN) L1[0x01d] = 000000022fe1d262 5555555555555555 > > I really can''t imagine how this can happen considering the vmx_alloc_vmcs() is > so straight-forward. My test machine is really magic.Not at all. The free-memory pool was getting spiked with guarded (mapped not-present) pages. The later unlucky allocator is the one that then crashes. I''ve just fixed this as xen-unstable:21504. The bug was a silly typo. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-02 10:23 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
But in alloc_xenheap_pages(), we do unguard the page again, is that useful? --jyh>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, June 02, 2010 5:41 PM >To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >#a3e7c7... > >On 02/06/2010 10:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> (XEN) Pagetable walk from ffff83022fe1d000: >> (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 >> (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 >> (XEN) L2[0x17f] = 000000022ff2a063 5555555555555555 >> (XEN) L1[0x01d] = 000000022fe1d262 5555555555555555 >> >> I really can''t imagine how this can happen considering the vmx_alloc_vmcs() is >> so straight-forward. My test machine is really magic. > >Not at all. The free-memory pool was getting spiked with guarded (mapped >not-present) pages. The later unlucky allocator is the one that then >crashes. > >I''ve just fixed this as xen-unstable:21504. The bug was a silly typo. > > Thanks, > Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 12:14 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
On 02/06/2010 09:01, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> With following small patch, it works, but I''m not sure if this is good >> solution. > > I forgot about inactive timers in c/s 21424. Hm, I will fix this in the > timer subsystem and get back to you.Fixed by xen-unstable:21508. K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 12:17 UTC
Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
That version of alloc_xenheap_pages is not built for x86_64. K. On 02/06/2010 11:23, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> But in alloc_xenheap_pages(), we do unguard the page again, is that useful? > > --jyh > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Wednesday, June 02, 2010 5:41 PM >> To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >> #a3e7c7... >> >> On 02/06/2010 10:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>> (XEN) Pagetable walk from ffff83022fe1d000: >>> (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 >>> (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 >>> (XEN) L2[0x17f] = 000000022ff2a063 5555555555555555 >>> (XEN) L1[0x01d] = 000000022fe1d262 5555555555555555 >>> >>> I really can''t imagine how this can happen considering the vmx_alloc_vmcs() >>> is >>> so straight-forward. My test machine is really magic. >> >> Not at all. The free-memory pool was getting spiked with guarded (mapped >> not-present) pages. The later unlucky allocator is the one that then >> crashes. >> >> I''ve just fixed this as xen-unstable:21504. The bug was a silly typo. >> >> Thanks, >> Keir >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Jun-02 13:33 UTC
RE: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: #a3e7c7...
Oops, I didn''t notice this. Thanks for your patch, I will test them tomorrow. --jyh>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, June 02, 2010 8:17 PM >To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >#a3e7c7... > >That version of alloc_xenheap_pages is not built for x86_64. > > K. > >On 02/06/2010 11:23, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> But in alloc_xenheap_pages(), we do unguard the page again, is that useful? >> >> --jyh >> >>> -----Original Message----- >>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>> Sent: Wednesday, June 02, 2010 5:41 PM >>> To: Jiang, Yunhong; Xu, Jiajun; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] Biweekly VMX status report. Xen: #21438 & Xen0: >>> #a3e7c7... >>> >>> On 02/06/2010 10:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>> >>>> (XEN) Pagetable walk from ffff83022fe1d000: >>>> (XEN) L4[0x106] = 00000000cfc8d027 5555555555555555 >>>> (XEN) L3[0x008] = 00000000cfef9063 5555555555555555 >>>> (XEN) L2[0x17f] = 000000022ff2a063 5555555555555555 >>>> (XEN) L1[0x01d] = 000000022fe1d262 5555555555555555 >>>> >>>> I really can''t imagine how this can happen considering the vmx_alloc_vmcs() >>>> is >>>> so straight-forward. My test machine is really magic. >>> >>> Not at all. The free-memory pool was getting spiked with guarded (mapped >>> not-present) pages. The later unlucky allocator is the one that then >>> crashes. >>> >>> I''ve just fixed this as xen-unstable:21504. The bug was a silly typo. >>> >>> Thanks, >>> Keir >>> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel