Hi maintainers, We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). The system env is an exalogic node with 24 cores + 100G mem (2 socket , 6 cores per socket, 2 HT threads per core). After boot up this node with all cores enabled, We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, it takes 30+ mins to boot. If we remove passthrough device from vm.cfg, bootup takes about 2 mins. If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. So a big mem + passthrough device made the worst case. If we boot this node with HT disabled from BIOS. Now only 12 cores are available. OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! After some debug, we found it''s in kernel mtrr init that make this delay. mtrr_aps_init() \-> set_mtrr() \-> mtrr_work_handler() kernel spin in mtrr_work_handler. But we don''t know the scene hide in the hypervisor. Why big mem + passthrough made the worst case. Is this already fixed in xen upstream? Any comments are welcome, I''ll upload all data depend on your need. thanks zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 07.08.12 at 09:22, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > After some debug, we found it''s in kernel mtrr init that make this delay. > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > kernel spin in mtrr_work_handler. > > But we don''t know the scene hide in the hypervisor. Why big mem + > passthrough made the worst case. > Is this already fixed in xen upstream?First of all it would have been useful to indicate the kernel version, since mtrr_work_handler() disappeared after 3.0. Obviously worth checking whether that change by itself already addresses your problem. Next, if you already spotted where the spinning occurs, you should also be able to tell what''s going on at the other side, i.e. why the event that is being waited for isn''t occurring for this long a time. Since there''s a number of open coded spin loops here, knowing exactly which one each CPU is sitting in (and which ones might not be in any) is the fundamental information needed. From what you''re telling us so far, I''d rather suspect a kernel problem, not a hypervisor one here. Jan
On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote:> Hi maintainers, > > We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). > > The system env is an exalogic node with 24 cores + 100G mem (2 socket , > 6 cores per socket, 2 HT threads per core). > After boot up this node with all cores enabled, > We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, > it takes 30+ mins to boot. > If we remove passthrough device from vm.cfg, bootup takes about 2 mins. > If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. > So a big mem + passthrough device made the worst case. > > If we boot this node with HT disabled from BIOS. Now only 12 cores are > available. > OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! > > After some debug, we found it''s in kernel mtrr init that make this delay. > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > kernel spin in mtrr_work_handler. > > But we don''t know the scene hide in the hypervisor. Why big mem + > passthrough made the worst case. > Is this already fixed in xen upstream? > Any comments are welcome, I''ll upload all data depend on your need.What happens if you run with a upstream version of kernel? Say v3.4.7 ? Do you see the same issue?> > thanks > zduan> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
? 2012-08-08 00:26, Konrad Rzeszutek Wilk ??:> On Tue, Aug 07, 2012 at 03:22:50PM +0800, zhenzhong.duan wrote: >> Hi maintainers, >> >> We meet a uek2 bootup slow issue on our ovm product(ovm3.0.3 and ovm3.1.1). >> >> The system env is an exalogic node with 24 cores + 100G mem (2 socket , >> 6 cores per socket, 2 HT threads per core). >> After boot up this node with all cores enabled, >> We boot a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device, >> it takes 30+ mins to boot. >> If we remove passthrough device from vm.cfg, bootup takes about 2 mins. >> If we use a small mem(eg. 10G + 24 vcpus), bootup takes about 3 mins. >> So a big mem + passthrough device made the worst case. >> >> If we boot this node with HT disabled from BIOS. Now only 12 cores are >> available. >> OVM on same node, same config with 12vpcus+90GB boots in 1.5 mins! >> >> After some debug, we found it''s in kernel mtrr init that make this delay. >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> kernel spin in mtrr_work_handler. >> >> But we don''t know the scene hide in the hypervisor. Why big mem + >> passthrough made the worst case. >> Is this already fixed in xen upstream? >> Any comments are welcome, I''ll upload all data depend on your need. > What happens if you run with a upstream version of kernel? Say v3.4.7 ?Hi konrad, Jan, I tried 3.5.0-2.fc17.x86_64 and 3.6.0-rc1. * 3.5.0-2.fc17.x86_64 took ~30 mins.* Below is piece of fc17 dmesg: #22[ 0.002999] installing Xen timer for CPU 22 #23[ 0.002999] installing Xen timer for CPU 23 [ 1.844896] Brought up 24 CPUs [ 1.844898] Total of 24 processors activated (140449.34 BogoMIPS). *block for 30 mins here.* [ 1.899794] devtmpfs: initialized [ 1.905956] atomic64 test passed for x86-64 platform with CX8 and with SSE * 3.6.0-rc1 took more than 2 hours.* piece of dmesg: cpu 22 spinlock event irq 218 [ 1.884775] #22[ 0.001999] installing Xen timer for CPU 22 cpu 23 spinlock event irq 225 [ 1.932764] #23[ 0.001999] installing Xen timer for CPU 23 [ 1.977734] Brought up 24 CPUs [ 1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS) *block for more than 2 hours here.* [ 1.988859] devtmpfs: initialized [ 2.021785] dummy: [ 2.023706] NET: Registered protocol family 16 [ 2.026735] ACPI: bus type pci registered [ 2.028002] PCI: Using configuration type 1 for base access I also send a patch to lkml that can workaround this issue, but I don''t know the reason of block in xen side. link: https://lkml.org/lkml/2012/8/7/50 regards zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
于 2012-08-07 16:37, Jan Beulich 写道:>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> After some debug, we found it's in kernel mtrr init that make this delay. >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> kernel spin in mtrr_work_handler. >> >> But we don't know the scene hide in the hypervisor. Why big mem + >> passthrough made the worst case. >> Is this already fixed in xen upstream? > First of all it would have been useful to indicate the kernel version, > since mtrr_work_handler() disappeared after 3.0. Obviously worth > checking whether that change by itself already addresses your > problem.No luck, tried upstream kernel 3.6.0-rc1, seems worse. It took 2 hours to boot up.> > Next, if you already spotted where the spinning occurs, you > should also be able to tell what's going on at the other side, i.e. > why the event that is being waited for isn't occurring for this > long a time. Since there's a number of open coded spin loops > here, knowing exactly which one each CPU is sitting in (and > which ones might not be in any) is the fundamental information > needed. > > From what you're telling us so far, I'd rather suspect a kernel > problem, not a hypervisor one here.Per my finding, most of vcpus spin at set_atomicity_lock. Some spin at stop_machine after finish their job. Only one vcpu is calling generic_set_all. I'm not sure if the vcpu calling generic_set_all don't have higher priority and maybe preempt by other vcpus and dom0 frequently. This waste much time.> > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 08.08.12 at 11:23, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > I also send a patch to lkml that can workaround this issue, but I don''t > know the reason of block in xen side. > link: https://lkml.org/lkml/2012/8/7/50Without understanding the reason for this, I agree with hpa that blindly changing the kernel to address this is not really a good idea. Jan
>>> On 08.08.12 at 11:48, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> After some debug, we found it's in kernel mtrr init that make this delay. >>> >>> mtrr_aps_init() >>> \-> set_mtrr() >>> \-> mtrr_work_handler() >>> >>> kernel spin in mtrr_work_handler. >>> >>> But we don't know the scene hide in the hypervisor. Why big mem + >>> passthrough made the worst case. >>> Is this already fixed in xen upstream? >> First of all it would have been useful to indicate the kernel version, >> since mtrr_work_handler() disappeared after 3.0. Obviously worth >> checking whether that change by itself already addresses your >> problem. > No luck, tried upstream kernel 3.6.0-rc1, seems worse. It took 2 hours > to boot up.That's quite a big step from 3.0.x. And in another response you point out that 3.6 is way worse than 3.5 was. So maybe going back to 3.1 or 3.2 might be a better idea if debugging the issue doesn't get you anywhere. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 08.08.12 at 11:48, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> Next, if you already spotted where the spinning occurs, you >> should also be able to tell what's going on at the other side, i.e. >> why the event that is being waited for isn't occurring for this >> long a time. Since there's a number of open coded spin loops >> here, knowing exactly which one each CPU is sitting in (and >> which ones might not be in any) is the fundamental information >> needed. >> >> From what you're telling us so far, I'd rather suspect a kernel >> problem, not a hypervisor one here. > Per my finding, most of vcpus spin at set_atomicity_lock.Then you need to determine what the current owner of the lock is doing.> Some spin at stop_machine after finish their job.And here you'd need to find out what they're waiting for, and what those CPUs are doing.> Only one vcpu is calling generic_set_all. > I'm not sure if the vcpu calling generic_set_all don't have higher > priority and maybe preempt by other vcpus and dom0 frequently. > This waste much time.There's not that much being done in generic_set_all(), so the code should finish reasonably quickly. Are you perhaps having more vCPU-s in the guest than pCPU-s they can run on? Does your hardware support Pause-Loop-Exiting (or the AMD equivalent, don't recall their term right now)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
于 2012-08-08 23:01, Jan Beulich 写道:>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>>> On 07.08.12 at 09:22, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> Next, if you already spotted where the spinning occurs, you >>> should also be able to tell what's going on at the other side, i.e. >>> why the event that is being waited for isn't occurring for this >>> long a time. Since there's a number of open coded spin loops >>> here, knowing exactly which one each CPU is sitting in (and >>> which ones might not be in any) is the fundamental information >>> needed. >>> >>> From what you're telling us so far, I'd rather suspect a kernel >>> problem, not a hypervisor one here. >> Per my finding, most of vcpus spin at set_atomicity_lock. > Then you need to determine what the current owner of the > lock is doing.I add printk.time=1 to kernel cmdline, but dmesg don't show much help. [ 1.978706] smpboot: Total of 24 processors activated (140449.34 BogoMIPS) (block ~30 mins) [ 1.988859] devtmpfs: initialized> >> Some spin at stop_machine after finish their job. > And here you'd need to find out what they're waiting for, > and what those CPUs are doing.They are waiting the vcpu calling generic_set_all and those spin at set_atomicity_lock. In fact, all are waiting generic_set_all> >> Only one vcpu is calling generic_set_all. >> I'm not sure if the vcpu calling generic_set_all don't have higher >> priority and maybe preempt by other vcpus and dom0 frequently. >> This waste much time. > There's not that much being done in generic_set_all(), so the > code should finish reasonably quickly. Are you perhaps having > more vCPU-s in the guest than pCPU-s they can run on?System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 cores per socket, 2 HT threads per core). Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device.> Does > your hardware support Pause-Loop-Exiting (or the AMD > equivalent, don't recall their term right now)?I have no access to serial line, could I get the info by a command? /proc/cpuinfo shows below: cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz> > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 09.08.12 at 11:42, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-08 23:01, Jan Beulich 写道: >>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-07 16:37, Jan Beulich 写道: >>> Some spin at stop_machine after finish their job. >> And here you'd need to find out what they're waiting for, >> and what those CPUs are doing. > They are waiting the vcpu calling generic_set_all and those spin at > set_atomicity_lock. > In fact, all are waiting generic_set_allI think we're moving in circles - what is the vCPU currently generic_set_all() then doing?>> There's not that much being done in generic_set_all(), so the >> code should finish reasonably quickly. Are you perhaps having >> more vCPU-s in the guest than pCPU-s they can run on? > System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 > cores per socket, 2 HT threads per core). > Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device.So you're indeed over-committing the system. How many vCPU-s does you Dom0 have? Are there any other VMs? Is there any vCPU pinning in effect?>> Does >> your hardware support Pause-Loop-Exiting (or the AMD >> equivalent, don't recall their term right now)? > I have no access to serial line, could I get the info by a command?"xl dmesg" run early enough (i.e. before the log buffer wraps). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 10.08.12 at 06:40, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-09 18:35, Jan Beulich 写道: >>>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-08 23:01, Jan Beulich 写道: >>>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>>> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>> Some spin at stop_machine after finish their job. >>>> And here you'd need to find out what they're waiting for, >>>> and what those CPUs are doing. >>> They are waiting the vcpu calling generic_set_all and those spin at >>> set_atomicity_lock. >>> In fact, all are waiting generic_set_all >> I think we're moving in circles - what is the vCPU currently >> generic_set_all() then doing? > Add some debug print, generic_set_all->prepare_set->write_cr0 took much > time, > all else are quick. set_atomicity_lock serialized this process between > cpus, make it worse. > One iteration: > MTRR: CPU 2 > prepare_set: before read_cr0 > prepare_set: before write_cr0 ------*block here*Yeah, that CR0 write disables the caches, and that's pretty expensive on EPT (not sure why NPT doesn't use/need the same hook) when the guest has any active MMIO regions: vmx_set_uc_mode(), when HAP is enabled, calls ept_change_entry_emt_with_range(), which is a walk through the entire guest page tables (i.e. scales with guest size, or, to be precise, with the highest populated GFN). Going back to your original mail, I wonder however why this gets done at all. You said it got there via mtrr_aps_init() \-> set_mtrr() \-> mtrr_work_handler() yet this isn't done unconditionally - see the comment before checking mtrr_aps_delayed_init. Can you find out where the obviously necessary call(s) to set_mtrr_aps_delayed_init() come(s) from?> prepare_set: before wbinvd > prepare_set: before read_cr4 > prepare_set: before write_cr4 > prepare_set: before __flush_tlb > prepare_set: before rdmsr > prepare_set: before wrmsr > generic_set_all: before set_mtrr_state > generic_set_all: before pat_init > post_set: before wbinvd > post_set: before wrmsr > post_set: before write_cr0 > post_set: before write_cr4 > >> >>>> There's not that much being done in generic_set_all(), so the >>>> code should finish reasonably quickly. Are you perhaps having >>>> more vCPU-s in the guest than pCPU-s they can run on? >>> System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 >>> cores per socket, 2 HT threads per core). >>> Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. >> So you're indeed over-committing the system. How many vCPU-s >> does you Dom0 have? Are there any other VMs? Is there any >> vCPU pinning in effect? > dom0 boot with 24 vcpus(same result with dom0_max_vcpus=4). No other vm > except dom0. All 24 vcpus spin from xentop result. Below is xentop clip.Yes, this way you do overcommit - 24 guest vCPU-s spinning, plus anything Dom0 may need to do.>>>> Does >>>> your hardware support Pause-Loop-Exiting (or the AMD >>>> equivalent, don't recall their term right now)? >>> I have no access to serial line, could I get the info by a command? >> "xl dmesg" run early enough (i.e. before the log buffer wraps). > Below is xl dmesg result for your reference. thanks >... > (XEN) VMX: Supported advanced features: > (XEN) - APIC MMIO access virtualisation > (XEN) - APIC TPR shadow > (XEN) - Extended Page Tables (EPT) > (XEN) - Virtual-Processor Identifiers (VPID) > (XEN) - Virtual NMI > (XEN) - MSR direct-access bitmap > (XEN) - Unrestricted GuestI'm sorry, I had expected this to be printed here, but it isn't. Hence I can't tell for sure whether PLE is implemented there, but given how long it has been available it ought to be when "Unrestricted Guest" is there (which iirc got introduced much later). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ccing satish who first find this issue. 于 2012-08-10 22:22, Jan Beulich 写道:>>>> On 10.08.12 at 06:40, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-09 18:35, Jan Beulich 写道: >>>>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>> 于 2012-08-08 23:01, Jan Beulich 写道: >>>>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>>>>> 于 2012-08-07 16:37, Jan Beulich 写道: >>>>>> Some spin at stop_machine after finish their job. >>>>> And here you'd need to find out what they're waiting for, >>>>> and what those CPUs are doing. >>>> They are waiting the vcpu calling generic_set_all and those spin at >>>> set_atomicity_lock. >>>> In fact, all are waiting generic_set_all >>> I think we're moving in circles - what is the vCPU currently >>> generic_set_all() then doing? >> Add some debug print, generic_set_all->prepare_set->write_cr0 took much >> time, >> all else are quick. set_atomicity_lock serialized this process between >> cpus, make it worse. >> One iteration: >> MTRR: CPU 2 >> prepare_set: before read_cr0 >> prepare_set: before write_cr0 ------*block here* > > Yeah, that CR0 write disables the caches, and that's pretty > expensive on EPT (not sure why NPT doesn't use/need the > same hook) when the guest has any active MMIO regions: > vmx_set_uc_mode(), when HAP is enabled, calls > ept_change_entry_emt_with_range(), which is a walk through > the entire guest page tables (i.e. scales with guest size, or, to > be precise, with the highest populated GFN). > > Going back to your original mail, I wonder however why this > gets done at all. You said it got there via > > mtrr_aps_init() > \-> set_mtrr() > \-> mtrr_work_handler() > > yet this isn't done unconditionally - see the comment before > checking mtrr_aps_delayed_init. Can you find out where the > obviously necessary call(s) to set_mtrr_aps_delayed_init() > come(s) from?At bootup stage, set_mtrr_aps_delayed_init is called by native_smp_prepare_cpus. mtrr_aps_delayed_init is always set to ture for intel processor in upstream code.>>>>> Does >>>>> your hardware support Pause-Loop-Exiting (or the AMD >>>>> equivalent, don't recall their term right now)? >>>> I have no access to serial line, could I get the info by a command? >>> "xl dmesg" run early enough (i.e. before the log buffer wraps). >> Below is xl dmesg result for your reference. thanks >> ... >> (XEN) VMX: Supported advanced features: >> (XEN) - APIC MMIO access virtualisation >> (XEN) - APIC TPR shadow >> (XEN) - Extended Page Tables (EPT) >> (XEN) - Virtual-Processor Identifiers (VPID) >> (XEN) - Virtual NMI >> (XEN) - MSR direct-access bitmap >> (XEN) - Unrestricted Guest > > I'm sorry, I had expected this to be printed here, but it isn't. > Hence I can't tell for sure whether PLE is implemented there, > but given how long it has been available it ought to be when > "Unrestricted Guest" is there (which iirc got introduced much > later).From VMCS dump, looks PAUSE exiting is 0, PLE is 1. (XEN) *** Control State *** (XEN) PinBased=0000003f CPUBased=b6a065fe SecondaryExec=000004eb zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
At 15:22 +0100 on 10 Aug (1344612120), Jan Beulich wrote:> Yeah, that CR0 write disables the caches, and that''s pretty > expensive on EPT (not sure why NPT doesn''t use/need the > same hook) when the guest has any active MMIO regions: > vmx_set_uc_mode(), when HAP is enabled, calls > ept_change_entry_emt_with_range(), which is a walk through > the entire guest page tables (i.e. scales with guest size, or, to > be precise, with the highest populated GFN).:( That''s not so great. It can definitely be done more efficiently than with that for() loop, and I wonder whether there isn''t some better way involving flipping a global flag somewhere. If no EPT maintainers have commented on this by Thursday I''ll look into it then. Tim.
>>> On 13.08.12 at 09:58, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > 于 2012-08-10 22:22, Jan Beulich 写道: >> Going back to your original mail, I wonder however why this >> gets done at all. You said it got there via >> >> mtrr_aps_init() >> \-> set_mtrr() >> \-> mtrr_work_handler() >> >> yet this isn't done unconditionally - see the comment before >> checking mtrr_aps_delayed_init. Can you find out where the >> obviously necessary call(s) to set_mtrr_aps_delayed_init() >> come(s) from? > At bootup stage, set_mtrr_aps_delayed_init is called by > native_smp_prepare_cpus. > mtrr_aps_delayed_init is always set to ture for intel processor in upstream > code.Indeed, and that (in one form or another) has been done virtually forever in Linux. I wonder why the problem wasn't noticed (or looked into, if it was noticed) so far. As it's going to be rather difficult to convince the Linux folks to change their code (plus this wouldn't help with existing kernels anyway), we'll need to find a way to improve this in the hypervisor. One seemingly orthogonal thing would presumably help quite a bit on the guest side nevertheless - para-virtualized spin locks. I have no idea why this is only being done when running pv, but not for pvhvm. Konrad, Stefano? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, 13 Aug 2012, Jan Beulich wrote:> >>> On 13.08.12 at 09:58, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote: > > 于 2012-08-10 22:22, Jan Beulich 写道: > >> Going back to your original mail, I wonder however why this > >> gets done at all. You said it got there via > >> > >> mtrr_aps_init() > >> \-> set_mtrr() > >> \-> mtrr_work_handler() > >> > >> yet this isn''t done unconditionally - see the comment before > >> checking mtrr_aps_delayed_init. Can you find out where the > >> obviously necessary call(s) to set_mtrr_aps_delayed_init() > >> come(s) from? > > At bootup stage, set_mtrr_aps_delayed_init is called by > > native_smp_prepare_cpus. > > mtrr_aps_delayed_init is always set to ture for intel processor in upstream > > code. > > Indeed, and that (in one form or another) has been done > virtually forever in Linux. I wonder why the problem wasn''t > noticed (or looked into, if it was noticed) so far. > > As it''s going to be rather difficult to convince the Linux folks > to change their code (plus this wouldn''t help with existing > kernels anyway), we''ll need to find a way to improve this in > the hypervisor. > > One seemingly orthogonal thing would presumably help quite > a bit on the guest side nevertheless - para-virtualized spin > locks. I have no idea why this is only being done when running > pv, but not for pvhvm. Konrad, Stefano?I tried to use PV spinlocks on PV on HVM guests but I found that: commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Tue Sep 6 17:41:47 2011 +0100 xen: disable PV spinlocks on HVM PV spinlocks cannot possibly work with the current code because they are enabled after pvops patching has already been done, and because PV spinlocks use a different data structure than native spinlocks so we cannot switch between them dynamically. A spinlock that has been taken once by the native code (__ticket_spin_lock) cannot be taken by __xen_spin_lock even after it has been released. Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> at that time Jeremy was finishing off his PV ticket locks series, that has the nice side effect of making it much easier to implement PV on HVM spin locks so I just deciced to wait and just append the following patch to his series: http://marc.info/?l=xen-devel&m=131846828430409&w=2 that clearly never went upstream. --1342847746-504235113-1344855878=:21096 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --1342847746-504235113-1344855878=:21096--
On 2012-08-13 19:08, Stefano Stabellini wrote:> On Mon, 13 Aug 2012, Jan Beulich wrote: > > I tried to use PV spinlocks on PV on HVM guests but I found that: > > commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 > Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > Date: Tue Sep 6 17:41:47 2011 +0100 > > xen: disable PV spinlocks on HVM > > PV spinlocks cannot possibly work with the current code because they are > enabled after pvops patching has already been done, and because PV > spinlocks use a different data structure than native spinlocks so we > cannot switch between them dynamically. A spinlock that has been taken > once by the native code (__ticket_spin_lock) cannot be taken by > __xen_spin_lock even after it has been released. > > Reported-and-Tested-by: Stefan Bader<stefan.bader@canonical.com> > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com> > > > at that time Jeremy was finishing off his PV ticket locks series, that > has the nice side effect of making it much easier to implement PV on HVM > spin locks so I just deciced to wait and just append the following patch > to his series: > > http://marc.info/?l=xen-devel&m=131846828430409&w=2 > > that clearly never went upstream.Hi Stefano, Is there a schedule those patch merge to upstream? zduan
于 2012-08-13 17:29, Jan Beulich 写道:>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >> 于 2012-08-10 22:22, Jan Beulich 写道: >>> Going back to your original mail, I wonder however why this >>> gets done at all. You said it got there via >>> >>> mtrr_aps_init() >>> \-> set_mtrr() >>> \-> mtrr_work_handler() >>> >>> yet this isn't done unconditionally - see the comment before >>> checking mtrr_aps_delayed_init. Can you find out where the >>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>> come(s) from? >> At bootup stage, set_mtrr_aps_delayed_init is called by >> native_smp_prepare_cpus. >> mtrr_aps_delayed_init is always set to ture for intel processor in upstream >> code. > Indeed, and that (in one form or another) has been done > virtually forever in Linux. I wonder why the problem wasn't > noticed (or looked into, if it was noticed) so far. > > As it's going to be rather difficult to convince the Linux folks > to change their code (plus this wouldn't help with existing > kernels anyway), we'll need to find a way to improve this in > the hypervisor.Hi Jan, Tim Is this issue improvable from xen side? thanks zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 29 Aug 2012, zhenzhong.duan wrote:> > On 2012-08-13 19:08, Stefano Stabellini wrote: > > On Mon, 13 Aug 2012, Jan Beulich wrote: > > > > I tried to use PV spinlocks on PV on HVM guests but I found that: > > > > commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 > > Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > > Date: Tue Sep 6 17:41:47 2011 +0100 > > > > xen: disable PV spinlocks on HVM > > > > PV spinlocks cannot possibly work with the current code because they are > > enabled after pvops patching has already been done, and because PV > > spinlocks use a different data structure than native spinlocks so we > > cannot switch between them dynamically. A spinlock that has been taken > > once by the native code (__ticket_spin_lock) cannot be taken by > > __xen_spin_lock even after it has been released. > > > > Reported-and-Tested-by: Stefan Bader<stefan.bader@canonical.com> > > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > > Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com> > > > > > > at that time Jeremy was finishing off his PV ticket locks series, that > > has the nice side effect of making it much easier to implement PV on HVM > > spin locks so I just deciced to wait and just append the following patch > > to his series: > > > > http://marc.info/?l=xen-devel&m=131846828430409&w=2 > > > > that clearly never went upstream. > Hi Stefano, > Is there a schedule those patch merge to upstream?They are currently being handled by the KVM guys: https://lkml.org/lkml/2012/5/2/119
At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote:> > > ??? 2012-08-13 17:29, Jan Beulich ??????: > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> > >>>>wrote: > >>??? 2012-08-10 22:22, Jan Beulich ??????: > >>>Going back to your original mail, I wonder however why this > >>>gets done at all. You said it got there via > >>> > >>>mtrr_aps_init() > >>> \-> set_mtrr() > >>> \-> mtrr_work_handler() > >>> > >>>yet this isn''t done unconditionally - see the comment before > >>>checking mtrr_aps_delayed_init. Can you find out where the > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() > >>>come(s) from? > >>At bootup stage, set_mtrr_aps_delayed_init is called by > >>native_smp_prepare_cpus. > >>mtrr_aps_delayed_init is always set to ture for intel processor in > >>upstream > >>code. > >Indeed, and that (in one form or another) has been done > >virtually forever in Linux. I wonder why the problem wasn''t > >noticed (or looked into, if it was noticed) so far. > > > >As it''s going to be rather difficult to convince the Linux folks > >to change their code (plus this wouldn''t help with existing > >kernels anyway), we''ll need to find a way to improve this in > >the hypervisor. > Hi Jan, Tim > Is this issue improvable from xen side?Probably; we''re looking into the best way to address it. Tim.
>>> On 29.08.12 at 07:36, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote:> > 于 2012-08-13 17:29, Jan Beulich 写道: >>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> wrote: >>> 于 2012-08-10 22:22, Jan Beulich 写道: >>>> Going back to your original mail, I wonder however why this >>>> gets done at all. You said it got there via >>>> >>>> mtrr_aps_init() >>>> \-> set_mtrr() >>>> \-> mtrr_work_handler() >>>> >>>> yet this isn't done unconditionally - see the comment before >>>> checking mtrr_aps_delayed_init. Can you find out where the >>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>> come(s) from? >>> At bootup stage, set_mtrr_aps_delayed_init is called by >>> native_smp_prepare_cpus. >>> mtrr_aps_delayed_init is always set to ture for intel processor in upstream >>> code. >> Indeed, and that (in one form or another) has been done >> virtually forever in Linux. I wonder why the problem wasn't >> noticed (or looked into, if it was noticed) so far. >> >> As it's going to be rather difficult to convince the Linux folks >> to change their code (plus this wouldn't help with existing >> kernels anyway), we'll need to find a way to improve this in >> the hypervisor. > Is this issue improvable from xen side?Yes, we're investigating options. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
于 2012-08-30 17:03, Tim Deegan 写道:> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >> >> ??? 2012-08-13 17:29, Jan Beulich ??????: >>>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >>>>>> wrote: >>>> ??? 2012-08-10 22:22, Jan Beulich ??????: >>>>> Going back to your original mail, I wonder however why this >>>>> gets done at all. You said it got there via >>>>> >>>>> mtrr_aps_init() >>>>> \-> set_mtrr() >>>>> \-> mtrr_work_handler() >>>>> >>>>> yet this isn't done unconditionally - see the comment before >>>>> checking mtrr_aps_delayed_init. Can you find out where the >>>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>>> come(s) from? >>>> At bootup stage, set_mtrr_aps_delayed_init is called by >>>> native_smp_prepare_cpus. >>>> mtrr_aps_delayed_init is always set to ture for intel processor in >>>> upstream >>>> code. >>> Indeed, and that (in one form or another) has been done >>> virtually forever in Linux. I wonder why the problem wasn't >>> noticed (or looked into, if it was noticed) so far. >>> >>> As it's going to be rather difficult to convince the Linux folks >>> to change their code (plus this wouldn't help with existing >>> kernels anyway), we'll need to find a way to improve this in >>> the hypervisor. >> Hi Jan, Tim >> Is this issue improvable from xen side? > Probably; we're looking into the best way to address it.Hi Jan, Tim Is there any patch for us to test? We are looking foward to your fix. Our customer got unsatisfied with more than 30 mins of bootup and long time wait. Regards zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 19.09.12 at 04:39, "zhenzhong.duan" <zhenzhong.duan@oracle.com> wrote:> > 于 2012-08-30 17:03, Tim Deegan 写道: >> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >>> >>> ??? 2012-08-13 17:29, Jan Beulich ??????: >>>>>>> On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >>>>>>> wrote: >>>>> ??? 2012-08-10 22:22, Jan Beulich ??????: >>>>>> Going back to your original mail, I wonder however why this >>>>>> gets done at all. You said it got there via >>>>>> >>>>>> mtrr_aps_init() >>>>>> \-> set_mtrr() >>>>>> \-> mtrr_work_handler() >>>>>> >>>>>> yet this isn't done unconditionally - see the comment before >>>>>> checking mtrr_aps_delayed_init. Can you find out where the >>>>>> obviously necessary call(s) to set_mtrr_aps_delayed_init() >>>>>> come(s) from? >>>>> At bootup stage, set_mtrr_aps_delayed_init is called by >>>>> native_smp_prepare_cpus. >>>>> mtrr_aps_delayed_init is always set to ture for intel processor in >>>>> upstream >>>>> code. >>>> Indeed, and that (in one form or another) has been done >>>> virtually forever in Linux. I wonder why the problem wasn't >>>> noticed (or looked into, if it was noticed) so far. >>>> >>>> As it's going to be rather difficult to convince the Linux folks >>>> to change their code (plus this wouldn't help with existing >>>> kernels anyway), we'll need to find a way to improve this in >>>> the hypervisor. >>> Hi Jan, Tim >>> Is this issue improvable from xen side? >> Probably; we're looking into the best way to address it. > > Is there any patch for us to test?No, sorry. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, Aug 30, 2012 at 10:03:12AM +0100, Tim Deegan wrote:> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: > > > > > > ??? 2012-08-13 17:29, Jan Beulich ??????: > > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> > > >>>>wrote: > > >>??? 2012-08-10 22:22, Jan Beulich ??????: > > >>>Going back to your original mail, I wonder however why this > > >>>gets done at all. You said it got there via > > >>> > > >>>mtrr_aps_init() > > >>> \-> set_mtrr() > > >>> \-> mtrr_work_handler() > > >>> > > >>>yet this isn''t done unconditionally - see the comment before > > >>>checking mtrr_aps_delayed_init. Can you find out where the > > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() > > >>>come(s) from? > > >>At bootup stage, set_mtrr_aps_delayed_init is called by > > >>native_smp_prepare_cpus. > > >>mtrr_aps_delayed_init is always set to ture for intel processor in > > >>upstream > > >>code. > > >Indeed, and that (in one form or another) has been done > > >virtually forever in Linux. I wonder why the problem wasn''t > > >noticed (or looked into, if it was noticed) so far. > > > > > >As it''s going to be rather difficult to convince the Linux folks > > >to change their code (plus this wouldn''t help with existing > > >kernels anyway), we''ll need to find a way to improve this in > > >the hypervisor. > > Hi Jan, Tim > > Is this issue improvable from xen side? > > Probably; we''re looking into the best way to address it. > > Tim.Ping? Was there any progress on this? Thanks
On Mon, Apr 29, 2013 at 6:55 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Aug 30, 2012 at 10:03:12AM +0100, Tim Deegan wrote: >> At 13:36 +0800 on 29 Aug (1346247391), zhenzhong.duan wrote: >> > >> > >> > ??? 2012-08-13 17:29, Jan Beulich ??????: >> > >>>>On 13.08.12 at 09:58, "zhenzhong.duan"<zhenzhong.duan@oracle.com> >> > >>>>wrote: >> > >>??? 2012-08-10 22:22, Jan Beulich ??????: >> > >>>Going back to your original mail, I wonder however why this >> > >>>gets done at all. You said it got there via >> > >>> >> > >>>mtrr_aps_init() >> > >>> \-> set_mtrr() >> > >>> \-> mtrr_work_handler() >> > >>> >> > >>>yet this isn''t done unconditionally - see the comment before >> > >>>checking mtrr_aps_delayed_init. Can you find out where the >> > >>>obviously necessary call(s) to set_mtrr_aps_delayed_init() >> > >>>come(s) from? >> > >>At bootup stage, set_mtrr_aps_delayed_init is called by >> > >>native_smp_prepare_cpus. >> > >>mtrr_aps_delayed_init is always set to ture for intel processor in >> > >>upstream >> > >>code. >> > >Indeed, and that (in one form or another) has been done >> > >virtually forever in Linux. I wonder why the problem wasn''t >> > >noticed (or looked into, if it was noticed) so far. >> > > >> > >As it''s going to be rather difficult to convince the Linux folks >> > >to change their code (plus this wouldn''t help with existing >> > >kernels anyway), we''ll need to find a way to improve this in >> > >the hypervisor. >> > Hi Jan, Tim >> > Is this issue improvable from xen side? >> >> Probably; we''re looking into the best way to address it. >> >> Tim. > > Ping? Was there any progress on this? ThanksDoes this need to be added to our tracking list? -George