MaoXiaoyun
2011-May-18 16:19 UTC
[Xen-devel] Too much VCPUS makes domU high CPU utiliazation
HI: I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, it starts quickly and eveything is fine. But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and domU-D 8 VCPUS, later I start DomUA again, (so we have total 34 VCPUS, plus dom0), it will takes very long time for domU-A to start ,and during the start, its CPU utilizaiton is around 800% from xm top, after its start, it response very slow in VNC, ans CPU utiliazation keeps high. And right after I destoy other 3 domUs, domUA CPU drops to normal. It might relate to CPU schedule, btw, my xen is 4.0.1. Any comments? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-18 16:39 UTC
Re: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
On Thu, May 19, 2011 at 12:19:11AM +0800, MaoXiaoyun wrote:> > HI: > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, it starts quickly and eveything is fine. > > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and domU-D 8 VCPUS, later I start DomUA again, > (so we have total 34 VCPUS, plus dom0), it will takes very long time for domU-A to start ,and during the start, its CPU > utilizaiton is around 800% from xm top, after its start, it response very slow in VNC, ans CPU utiliazation keeps high. > And right after I destoy other 3 domUs, domUA CPU drops to normal.You might be missing some patches for the Dom0 irq round-robin code. Is your Dom0 2.6.32 pvops? Look for these patches in your git log: xen: events: Remove redundant clear of l2i at end of round-robin loop xen: events: Make round-robin scan fairer by snapshotting each l2 word once only xen: events: Clean up round-robin evtchn scan. xen: events: Make last processed event channel a per-cpu variable. xen: events: Process event channels notifications in round-robin order.> > It might relate to CPU schedule, btw, my xen is 4.0.1. > > Any comments? > >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-19 02:35 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
My kernel version http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ae333e97552c81ab10395ad1ffc6d6daaadb144a I think those patch are inclued. BTW, my irqbalance is disabled in dom0. thanks> Date: Wed, 18 May 2011 12:39:50 -0400 > From: konrad.wilk@oracle.com > To: tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com; george.dunlap@eu.citrix.com > Subject: Re: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > On Thu, May 19, 2011 at 12:19:11AM +0800, MaoXiaoyun wrote: > > > > HI: > > > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, it starts quickly and eveything is fine. > > > > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and domU-D 8 VCPUS, later I start DomUA again, > > (so we have total 34 VCPUS, plus dom0), it will takes very long time for domU-A to start ,and during the start, its CPU > > utilizaiton is around 800% from xm top, after its start, it response very slow in VNC, ans CPU utiliazation keeps high. > > And right after I destoy other 3 domUs, domUA CPU drops to normal. > > You might be missing some patches for the Dom0 irq round-robin code. Is your Dom0 2.6.32 pvops? > Look for these patches in your git log: > xen: events: Remove redundant clear of l2i at end of round-robin loop > xen: events: Make round-robin scan fairer by snapshotting each l2 word once only > xen: events: Clean up round-robin evtchn scan. > xen: events: Make last processed event channel a per-cpu variable. > xen: events: Process event channels notifications in round-robin order. > > > > > It might relate to CPU schedule, btw, my xen is 4.0.1. > > > > Any comments? > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-19 03:39 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
>From: MaoXiaoyun >Sent: Thursday, May 19, 2011 12:19 AM >HI: > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, it starts quickly and eveything is fine.does same thing happen if you launch B/C/D after A? Thanks Kevin> > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and domU-D 8 VCPUS, later I start DomUA again,(so we have total 34 VCPUS, plus dom0), it will takes very long time for domU-A to start ,and during the start, its CPU utilizaiton is around 800% from xm top, after its start, it response very slow in VNC, ans CPU utiliazation keeps high. And right after I destoy other 3 domUs, domUA CPU drops to normal.> > It might relate to CPU schedule, btw, my xen is 4.0.1. > > Any comments?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-May-19 16:21 UTC
[Xen-devel] Re: Too much VCPUS makes domU high CPU utiliazation
This happens only during boot, is that right? It sounds like maybe Linux is trying to use some kind of barrier synchronization and failing because it''s having a hard time getting all 16 vcpus to run at once. If that''s the case, I think it''s pretty much a Linux kernel issue; not sure what the hypervisor can do about it. -George On Wed, 2011-05-18 at 17:19 +0100, MaoXiaoyun wrote:> HI: > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, > it starts quickly and eveything is fine. > > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and > domU-D 8 VCPUS, later I start DomUA again, > (so we have total 34 VCPUS, plus dom0), it will takes very long time > for domU-A to start ,and during the start, its CPU > utilizaiton is around 800% from xm top, after its start, it response > very slow in VNC, ans CPU utiliazation keeps high. > And right after I destoy other 3 domUs, domUA CPU drops to normal. > > It might relate to CPU schedule, btw, my xen is 4.0.1. > > Any comments? > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-19 16:23 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
> From: kevin.tian@intel.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > CC: george.dunlap@eu.citrix.com > Date: Thu, 19 May 2011 11:39:14 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun > >Sent: Thursday, May 19, 2011 12:19 AM > >HI: > > > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, it starts quickly and eveything is fine. > > does same thing happen if you launch B/C/D after A? >>From test aspects, not really, all domains CPU Util are low.It looks like this only happen in the process of domain A booting, and result in quite long time to boot. One thing to correct, today even I destory B/C/D, domain A still comsume 800% CPU for quite a long time till now I am writing this mail. Another strang thing is seems all VCPUS of domainUA, are running only on even number Physicals CPUS(that is 0, 2,4...), where explains where CPU Util is 800%. But I don''t understand whether this is designed to. Below is the detail shedule log from serial. Thanks. (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 16 (XEN) master = 0 (XEN) credit = 4800 (XEN) credit balance = 2324 (XEN) weight = 512 (XEN) runq_sort = 726199 (XEN) default-weight = 256 (XEN) msecs per tick = 10ms (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) ticks per acct = 3 (XEN) migration delay = 0us (XEN) idlers: 00000000,00000000,00000000,0000aaaa (XEN) active vcpus: (XEN) 1: [0.0] pri=0 flags=0 cpu=0 credit=50 [w=256] (XEN) 2: [14.8] pri=-1 flags=0 cpu=8 credit=-32 [w=256] (XEN) 3: [14.5] pri=-1 flags=0 cpu=14 credit=-188 [w=256] (XEN) 4: [14.10] pri=-1 flags=0 cpu=6 credit=-187 [w=256] (XEN) 5: [14.14] pri=-1 flags=0 cpu=8 credit=0 [w=256] (XEN) 6: [14.11] pri=-1 flags=0 cpu=6 credit=0 [w=256] (XEN) 7: [14.0] pri=-1 flags=0 cpu=4 credit=0 [w=256] (XEN) 8: [14.15] pri=-1 flags=0 cpu=0 credit=297 [w=256] (XEN) 9: [14.9] pri=-1 flags=0 cpu=8 credit=300 [w=256] (XEN) 10: [14.2] pri=-1 flags=0 cpu=6 credit=300 [w=256] (XEN) 11: [14.6] pri=-1 flags=0 cpu=4 credit=134 [w=256] (XEN) 12: [14.3] pri=-1 flags=0 cpu=14 credit=288 [w=256] (XEN) 13: [14.12] pri=-1 flags=0 cpu=12 credit=-83 [w=256] (XEN) 14: [14.7] pri=-1 flags=0 cpu=2 credit=65 [w=256] (XEN) 15: [14.4] pri=-1 flags=0 cpu=10 credit=-145 [w=256] (XEN) sched_smt_power_savings: disabled (XEN) NOW=0x00002C6DB4F0EF4B (XEN) CPU[00] sort=726199, sibling=00000000,00000000,00000000,00000101, core=00000000,00000000,00000000,00005555 (XEN) run: [14.13] pri=-1 flags=0 cpu=0 credit=-2 [w=256] (XEN) 1: [14.15] pri=-1 flags=0 cpu=0 credit=297 [w=256] (XEN) 2: [14.1] pri=-1 flags=0 cpu=0 credit=-1 [w=256] (XEN) 3: [32767.0] pri=-64 flags=0 cpu=0 (XEN) CPU[01] sort=726199, sibling=00000000,00000000,00000000,00000202, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.1] pri=-64 flags=0 cpu=1 (XEN) CPU[02] sort=726199, sibling=00000000,00000000,00000000,00000404, core=00000000,00000000,00000000,00005555 (XEN) run: [0.2] pri=0 flags=0 cpu=2 credit=-89 [w=256] (XEN) 1: [14.7] pri=-1 flags=0 cpu=2 credit=65 [w=256] (XEN) 2: [32767.2] pri=-64 flags=0 cpu=2 (XEN) CPU[03] sort=726199, sibling=00000000,00000000,00000000,00000808, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.3] pri=-64 flags=0 cpu=3 (XEN) CPU[04] sort=726199, sibling=00000000,00000000,00000000,00001010, core=00000000,00000000,00000000,00005555 (XEN) run: [14.6] pri=-1 flags=0 cpu=4 credit=134 [w=256] (XEN) 1: [14.0] pri=-1 flags=0 cpu=4 credit=0 [w=256] (XEN) 2: [32767.4] pri=-64 flags=0 cpu=4 (XEN) CPU[05] sort=726199, sibling=00000000,00000000,00000000,00002020, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.5] pri=-64 flags=0 cpu=5 (XEN) CPU[06] sort=726199, sibling=00000000,00000000,00000000,00004040, core=00000000,00000000,00000000,00005555 (XEN) run: [14.10] pri=-1 flags=0 cpu=6 credit=-187 [w=256] (XEN) 1: [14.2] pri=-1 flags=0 cpu=6 credit=300 [w=256] (XEN) 2: [14.11] pri=-1 flags=0 cpu=6 credit=0 [w=256] (XEN) 3: [32767.6] pri=-64 flags=0 cpu=6 (XEN) CPU[07] sort=726199, sibling=00000000,00000000,00000000,00008080, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.7] pri=-64 flags=0 cpu=7 (XEN) CPU[08] sort=726199, sibling=00000000,00000000,00000000,00000101, core=00000000,00000000,00000000,00005555 (XEN) run: [14.8] pri=-1 flags=0 cpu=8 credit=-32 [w=256] (XEN) 1: [14.14] pri=-1 flags=0 cpu=8 credit=0 [w=256] (XEN) 2: [14.9] pri=-1 flags=0 cpu=8 credit=300 [w=256] (XEN) 3: [32767.8] pri=-64 flags=0 cpu=8 (XEN) CPU[09] sort=726199, sibling=00000000,00000000,00000000,00000202, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.9] pri=-64 flags=0 cpu=9 (XEN) CPU[10] sort=726199, sibling=00000000,00000000,00000000,00000404, core=00000000,00000000,00000000,00005555 (XEN) run: [14.4] pri=-1 flags=0 cpu=10 credit=-145 [w=256] (XEN) 1: [32767.10] pri=-64 flags=0 cpu=10 (XEN) CPU[11] sort=726199, sibling=00000000,00000000,00000000,00000808, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.11] pri=-64 flags=0 cpu=11 (XEN) CPU[12] sort=726199, sibling=00000000,00000000,00000000,00001010, core=00000000,00000000,00000000,00005555 (XEN) run: [14.12] pri=-1 flags=0 cpu=12 credit=-83 [w=256] (XEN) 1: [32767.12] pri=-64 flags=0 cpu=12 (XEN) CPU[13] sort=726199, sibling=00000000,00000000,00000000,00002020, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.13] pri=-64 flags=0 cpu=13 (XEN) CPU[14] sort=726199, sibling=00000000,00000000,00000000,00004040, core=00000000,00000000,00000000,00005555 (XEN) run: [14.5] pri=-1 flags=0 cpu=14 credit=-188 [w=256] (XEN) 1: [14.3] pri=-1 flags=0 cpu=14 credit=288 [w=256] (XEN) 2: [32767.14] pri=-64 flags=0 cpu=14 (XEN) CPU[15] sort=726199, sibling=00000000,00000000,00000000,00008080, core=00000000,00000000,00000000,0000aaaa (XEN) run: [32767.15] pri=-64 flags=0 cpu=15> Thanks > Kevin > > > > > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and domU-D 8 VCPUS, later I start DomUA again, > (so we have total 34 VCPUS, plus dom0), it will takes very long time for domU-A to start ,and during the start, its CPU > utilizaiton is around 800% from xm top, after its start, it response very slow in VNC, ans CPU utiliazation keeps high. > And right after I destoy other 3 domUs, domUA CPU drops to normal. > > > > It might relate to CPU schedule, btw, my xen is 4.0.1. > > > > Any comments? > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-19 16:29 UTC
[Xen-devel] RE: Too much VCPUS makes domU high CPU utiliazation
> Subject: Re: Too much VCPUS makes domU high CPU utiliazation > From: george.dunlap@citrix.com > To: tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com; George.Dunlap@eu.citrix.com > Date: Thu, 19 May 2011 17:21:12 +0100 > > This happens only during boot, is that right? It sounds like maybe > Linux is trying to use some kind of barrier synchronization and failing > because it''s having a hard time getting all 16 vcpus to run at once. If > that''s the case, I think it''s pretty much a Linux kernel issue; not sure > what the hypervisor can do about it. > > -George>From the test result, it is.Are there some tools existed, and I can do some further analyze. Thanks.> > On Wed, 2011-05-18 at 17:19 +0100, MaoXiaoyun wrote: > > HI: > > > > I have a host with 16 physical CPUS. Dom0 has 4 VCPUS. > > > > When only start a domU-A(windows 2003 x86 64rc 2) with 16 VCPUS, > > it starts quickly and eveything is fine. > > > > But if I first start domU-B, which has 2VCPUS, domU-C 4VCPUS, and > > domU-D 8 VCPUS, later I start DomUA again, > > (so we have total 34 VCPUS, plus dom0), it will takes very long time > > for domU-A to start ,and during the start, its CPU > > utilizaiton is around 800% from xm top, after its start, it response > > very slow in VNC, ans CPU utiliazation keeps high. > > And right after I destoy other 3 domUs, domUA CPU drops to normal. > > > > It might relate to CPU schedule, btw, my xen is 4.0.1. > > > > Any comments? > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-19 23:29 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
>From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] >Sent: Friday, May 20, 2011 12:24 AM >> >> does same thing happen if you launch B/C/D after A? >> > >From test aspects, not really, all domains CPU Util are low. >It looks like this only happen in the process of domain A booting, and result in quite long time to boot.One possible reason is the lock contention at the boot time. If the lock holder is preempted unexpectedly and lock waiter continues to consume cycles, the boot progress could be slow. Which kernel version are you using? Does a different kernel version expose same problem?> >One thing to correct, today even I destory B/C/D, domain A still comsume 800% CPU for quite a long time till now I am writing >this mail.So this phenomenon is intermittent? In your earlier mail A is back to normal after you destroyed B/C/D, and this time the slowness continues.>Another strang thing is seems all VCPUS of domainUA, are running only on even number Physicals CPUS(that is 0, 2,4...), >where explains where CPU Util is 800%. But I don''t understand whether this is designed to.Are there any hard limitation you added from the configuration file? Or do you observe this weird affinity all the time or occasionally? It looks strange to me that scheduler will consistently keep such affinity if you don''t assign it explicitly. you may want to run xenoprof to sampling domain A to see its hot points, or use xentrace to trace VM-exit events to deduce from another angle. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-20 15:51 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
> From: kevin.tian@intel.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > CC: george.dunlap@eu.citrix.com > Date: Fri, 20 May 2011 07:29:55 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] > >Sent: Friday, May 20, 2011 12:24 AM > >> > >> does same thing happen if you launch B/C/D after A? > >> > > > >From test aspects, not really, all domains CPU Util are low. > >It looks like this only happen in the process of domain A booting, and result in quite long time to boot. > > One possible reason is the lock contention at the boot time. If the lock holder > is preempted unexpectedly and lock waiter continues to consume cycles, the > boot progress could be slow. > > Which kernel version are you using? Does a different kernel version expose > same problem? >At the very early, it is reported that VM with large number VCPUS result in high CPU utilization and very slow response, and the kernel version is 2.6.31.13. I was trying to do some investigation now. But the issue I report in this mail is found in 2.6.32.36.> > > >One thing to correct, today even I destory B/C/D, domain A still comsume 800% CPU for quite a long time till now I am writing > >this mail. > > So this phenomenon is intermittent? In your earlier mail A is back to normal > after you destroyed B/C/D, and this time the slowness continues. > > >Another strang thing is seems all VCPUS of domainUA, are running only on even number Physicals CPUS(that is 0, 2,4...), > >where explains where CPU Util is 800%. But I don''t understand whether this is designed to. > > Are there any hard limitation you added from the configuration file? Or do you > observe this weird affinity all the time or occasionally? It looks strange to me > that scheduler will consistently keep such affinity if you don''t assign it explicitly. >In fact I have twn machine for test, M1 and M2, for now, the issue only show M1. I think I have no special configuration.After I compare the "xm info" of M1 and M2, I found that M1 has two nr_nodes while M2 only have one. This may explain where either all DomUA VCPUS all on even number of Physical CPUs. And also, this might cause guest slow booting. What do u think of this? thanks. =================M1 output====================release : 2.6.32.36xen version : #1 SMP Fri May 20 15:27:10 CST 2011 machine : x86_64 nr_cpus : 16 nr_nodes : 2 cores_per_socket : 4 threads_per_core : 2 cpu_mhz : 2400 hw_caps : bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 virt_caps : hvm total_memory : 24567 free_memory : 11898 node_to_cpu : node0:0,2,4,6,8,10,12,14 node1:1,3,5,7,9,11,13,15 node_to_memory : node0:3997 node1:7901 node_to_dma32_mem : node0:2995 node1:0 max_node_id : 1 xen_major : 4 xen_minor : 0 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : msi=1 iommu=off x2apic=off console=com1,vga com1=115200,8n1 noreboot dom0_mem=5630M dom0_max_vcpus=4 dom0_vcpus_pin cpuidle=0 cpufreq=none no-xsave =================M2 output====================release : 2.6.32.36xen version : #1 SMP Fri May 20 15:27:10 CST 2011 machine : x86_64 nr_cpus : 16 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 2 cpu_mhz : 2400 hw_caps : bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 virt_caps : hvm total_memory : 24567 free_memory : 11898 node_to_cpu : node0:0-15 node_to_memory : node0:11898 node_to_dma32_mem : node0:2995 max_node_id : 0 xen_major : 4 xen_minor : 0 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : msi=1 iommu=off x2apic=off console=com1,vga com1=115200,8n1 noreboot dom0_mem=max:5630M dom0_max_vcpus=4 dom0_vcpus_pin cpuidle=0 cpufreq=none no-xsave> you may want to run xenoprof to sampling domain A to see its hot points, or > use xentrace to trace VM-exit events to deduce from another angle. > > Thanks > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-21 03:43 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
Although I still not figure out why VCPU fall either on even or odd PCPUS only , If I explictly set "VCPU=[4~15]" in HVM configuration, VM will use all PCPUS from 4 to 15. Also I may find the reason why guest boot so slow. I think the reason is the Number of Guest VCPU > the Number of physical CPUs that the Guest can run on In my test, my physical has 16 PCPUS and dom0 takes 4, so for every Guest, only 12 Physical CPUs are available. So, if Guest has 16 VCPUS, and only 12 Physical are available, when heavy load, there will be two or more VCPUS are queued on one Physical CPU, and if there exists VCPU is waiting for other other VCPUS respone(such as IPI memssage), the waiting time would be much longer. Especially, during Guest running time, if a process inside Guest takes 16 threads to run, then it is possible each VCPU owns one thread, under physical, those VCPUs still queue on PCPUS, if there is some busy waiting code process, such as (spinlock), it will make Guest high CPU utilization. If the the busy waiting code is not so frequently, we might see CPU utilization jump to very high and drop to low now and then. Could it be possible? Many thanks.> From: kevin.tian@intel.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > CC: george.dunlap@eu.citrix.com > Date: Fri, 20 May 2011 07:29:55 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] > >Sent: Friday, May 20, 2011 12:24 AM > >> > >> does same thing happen if you launch B/C/D after A? > >> > > > >From test aspects, not really, all domains CPU Util are low. > >It looks like this only happen in the process of domain A booting, and result in quite long time to boot. > > One possible reason is the lock contention at the boot time. If the lock holder > is preempted unexpectedly and lock waiter continues to consume cycles, the > boot progress could be slow. > > Which kernel version are you using? Does a different kernel version expose > same problem? > > > > >One thing to correct, today even I destory B/C/D, domain A still comsume 800% CPU for quite a long time till now I am writing > >this mail. > > So this phenomenon is intermittent? In your earlier mail A is back to normal > after you destroyed B/C/D, and this time the slowness continues. > > >Another strang thing is seems all VCPUS of domainUA, are running only on even number Physicals CPUS(that is 0, 2,4...), > >where explains where CPU Util is 800%. But I don''t understand whether this is designed to. > > Are there any hard limitation you added from the configuration file? Or do you > observe this weird affinity all the time or occasionally? It looks strange to me > that scheduler will consistently keep such affinity if you don''t assign it explicitly. > > you may want to run xenoprof to sampling domain A to see its hot points, or > use xentrace to trace VM-exit events to deduce from another angle. > > Thanks > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-22 02:05 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
>From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] >Sent: Friday, May 20, 2011 11:52 PM >> >> >Another strang thing is seems all VCPUS of domainUA, are running only on even number Physicals CPUS(that is 0, 2,4...), >> >where explains where CPU Util is 800%. But I don''t understand whether this is designed to. >> >> Are there any hard limitation you added from the configuration file? Or do you >> observe this weird affinity all the time or occasionally? It looks strange to me >> that scheduler will consistently keep such affinity if you don''t assign it explicitly. >> >In fact I have twn machine for test, M1 and M2, for now, the issue only show M1. >I think I have no special configuration.After I compare the "xm info" of M1 and M2, I found >that M1 has two nr_nodes while M2 only have one. >This may explain where either all DomUA VCPUS all on even number of Physical CPUs. > And also, this might cause guest slow booting. >What do u think of this? >do you enable NUMA on M1, which may be the reason for the affinity? Could you post the cpu topology of M1 which may prove this point? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-22 02:13 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
>From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] >Sent: Saturday, May 21, 2011 11:44 AM > >Although I still not figure out why VCPU fall either on even or odd PCPUS only , If I explictly set "VCPU=[4~15]" in HVM configuration, VM will use >all PCPUS from 4 to 15.This may implicate that NUMA is enabled on your M1 and thus Xen scheduler tries to use local memory to avoid remote access latency and that''s why your domain A is affined to a fix set of cpus>Also I may find the reason why guest boot so slow. > >I think the reason is the Number of Guest VCPU > the Number of physical CPUs that the Guest can run on >In my test, my physical has 16 PCPUS and dom0 takes 4, so for every Guest, only 12 Physical CPUs are available.The scheduler in the hypervisor is designed to multiplex multiple vcpus on a single cpu, and thus even when dom0 has 4 vcpus it doesn''t mean that only the rest 12 pcpus are available for use.>So, if Guest has 16 VCPUS, and only 12 Physical are available, when heavy load, there will be two or more VCPUS are queued >on one Physical CPU, and if there exists VCPU is waiting for other other VCPUS respone(such as IPI memssage), the waiting time >would be much longer. > >Especially, during Guest running time, if a process inside Guest takes 16 threads to run, then it is possible each VCPU owns one >thread, under physical, those VCPUs still queue on PCPUS, if there is some busy waiting code process, such as (spinlock), >it will make Guest high CPU utilization. If the the busy waiting code is not so frequently, we might see CPU utilization jump to >very high and drop to low now and then. > >Could it be possible?It''s possible. As I replied in earlier thread, lock contention at boot time may slow down the process slightly or heavily. Remember that the purpose of virtualization is to consolidate multiple VMs on a single platform to maximum resource utilization. Some use cases can have N:1 (where N can be 8) consolidation ratio, and others may have smaller ratio. There''re many reasons for a given environment to scale up, and you need capture enough trace information for the bottleneck. Some bottlenecks may be hard to tackle which will finally form into your business best practice, while some may be simply improved by proper configuration change. So it''s really too early to say whether your setup is not feasible or not. You need dive into it with more details. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MaoXiaoyun
2011-May-22 07:45 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
> From: kevin.tian@intel.com > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > CC: george.dunlap@eu.citrix.com > Date: Sun, 22 May 2011 10:13:59 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] > >Sent: Saturday, May 21, 2011 11:44 AM > > > >Although I still not figure out why VCPU fall either on even or odd PCPUS only , If I explictly set "VCPU=[4~15]" in HVM configuration, VM will use > >all PCPUS from 4 to 15. > > This may implicate that NUMA is enabled on your M1 and thus Xen scheduler tries to use > local memory to avoid remote access latency and that''s why your domain A is affined to > a fix set of cpus >That''s it. I''ve saw NUMA enabled for M1 in BIOS config while M2 is disabled.> >Also I may find the reason why guest boot so slow. > > > >I think the reason is the Number of Guest VCPU > the Number of physical CPUs that the Guest can run on > >In my test, my physical has 16 PCPUS and dom0 takes 4, so for every Guest, only 12 Physical CPUs are available. > > The scheduler in the hypervisor is designed to multiplex multiple vcpus on a single cpu, > and thus even when dom0 has 4 vcpus it doesn''t mean that only the rest 12 pcpus are > available for use. >Oh, I should have explained that currently we pin dom0 VCPU in first PCPUS(add dom0_max_vcpus=4 dom0_vcpus_pin in grub) And for Guest, we set "CPU=[4-15]" in HVM, so actually the PCPU for both dom0 and Guest VM is limited. We set this for the purpose of ensure dom0 get better performance.> >So, if Guest has 16 VCPUS, and only 12 Physical are available, when heavy load, there will be two or more VCPUS are queued > >on one Physical CPU, and if there exists VCPU is waiting for other other VCPUS respone(such as IPI memssage), the waiting time > >would be much longer. > > > >Especially, during Guest running time, if a process inside Guest takes 16 threads to run, then it is possible each VCPU owns one > >thread, under physical, those VCPUs still queue on PCPUS, if there is some busy waiting code process, such as (spinlock), > >it will make Guest high CPU utilization. If the the busy waiting code is not so frequently, we might see CPU utilization jump to > >very high and drop to low now and then. > > > >Could it be possible? > > It''s possible. As I replied in earlier thread, lock contention at boot time may slow down > the process slightly or heavily. Remember that the purpose of virtualization is to > consolidate multiple VMs on a single platform to maximum resource utilization. Some > use cases can have N:1 (where N can be 8) consolidation ratio, and others may have > smaller ratio. There''re many reasons for a given environment to scale up, and you need > capture enough trace information for the bottleneck. Some bottlenecks may be hard > to tackle which will finally form into your business best practice, while some may be > simply improved by proper configuration change. So it''s really too early to say whether > your setup is not feasible or not. You need dive into it with more details. :-) >I agree. I did some extra tests. In "xm top", there is a colum "VBD_RD", means the number of Read IO, i count the time from VM start to first IO Read. 1) Guest VCPU = 16, set cpu="4-15" in HVM, first IO shows up 85 seconds after VM start. 2) Guest VCPU = 12, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start. 3) Guest VCPU = 8, set cpu="4-7" in HVM, first IO shows up 90 seconds after VM start. 4) Guest VCPU = 8, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start. 5) Guest VCPU = 16, set cpu="0-15" in HVM, first IO shows up 23 seconds after VM start. Previous I mentioned that we give first 4 Physical CPU *only* to dom0, it looks like, for larger VCPU guest, say 16, I shall not limit its PCPU to 12, but give all available PCPU to them.(just like test 5)> Thanks > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-24 11:26 UTC
RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation
that''s a nice catch. :) Thanks Kevin From: MaoXiaoyun [mailto:tinnycloud@hotmail.com] Sent: Sunday, May 22, 2011 3:45 PM To: Tian, Kevin; xen devel Cc: george.dunlap@eu.citrix.com Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation> From: kevin.tian@intel.com<mailto:kevin.tian@intel.com> > To: tinnycloud@hotmail.com<mailto:tinnycloud@hotmail.com>; xen-devel@lists.xensource.com<mailto:xen-devel@lists.xensource.com> > CC: george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com> > Date: Sun, 22 May 2011 10:13:59 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]<mailto:[mailto:tinnycloud@hotmail.com]> > >Sent: Saturday, May 21, 2011 11:44 AM > > > >Although I still not figure out why VCPU fall either on even or odd PCPUS only , If I explictly set "VCPU=[4~15]" in HVM configuration, VM will use > >all PCPUS from 4 to 15. > > This may implicate that NUMA is enabled on your M1 and thus Xen scheduler tries to use > local memory to avoid remote access latency and that''s why your domain A is affined to > a fix set of cpus >That''s it. I''ve saw NUMA enabled for M1 in BIOS config while M2 is disabled.> >Also I may find the reason why guest boot so slow. > > > >I think the reason is the Number of Guest VCPU > the Number of physical CPUs that the Guest can run on > >In my test, my physical has 16 PCPUS and dom0 takes 4, so for every Guest, only 12 Physical CPUs are available. > > The scheduler in the hypervisor is designed to multiplex multiple vcpus on a single cpu, > and thus even when dom0 has 4 vcpus it doesn''t mean that only the rest 12 pcpus are > available for use. >Oh, I should have explained that currently we pin dom0 VCPU in first PCPUS(add dom0_max_vcpus=4 dom0_vcpus_pin in grub) And for Guest, we set "CPU=[4-15]" in HVM, so actually the PCPU for both dom0 and Guest VM is limited. We set this for the purpose of ensure dom0 get better performance.> >So, if Guest has 16 VCPUS, and only 12 Physical are available, when heavy load, there will be two or more VCPUS are queued > >on one Physical CPU, and if there exists VCPU is waiting for other other VCPUS respone(such as IPI memssage), the waiting time > >would be much longer. > > > >Especially, during Guest running time, if a process inside Guest takes 16 threads to run, then it is possible each VCPU owns one > >thread, under physical, those VCPUs still queue on PCPUS, if there is some busy waiting code process, such as (spinlock), > >it will make Guest high CPU utilization. If the the busy waiting code is not so frequently, we might see CPU utilization jump to > >very high and drop to low now and then. > > > >Could it be possible? > > It''s possible. As I replied in earlier thread, lock contention at boot time may slow down > the process slightly or heavily. Remember that the purpose of virtualization is to > consolidate multiple VMs on a single platform to maximum resource utilization. Some > use cases can have N:1 (where N can be 8) consolidation ratio, and others may have > smaller ratio. There''re many reasons for a given environment to scale up, and you need > capture enough trace information for the bottleneck. Some bottlenecks may be hard > to tackle which will finally form into your business best practice, while some may be > simply improved by proper configuration change. So it''s really too early to say whether > your setup is not feasible or not. You need dive into it with more details. :-) >I agree. I did some extra tests. In "xm top", there is a colum "VBD_RD", means the number of Read IO, i count the time from VM start to first IO Read. 1) Guest VCPU = 16, set cpu="4-15" in HVM, first IO shows up 85 seconds after VM start. 2) Guest VCPU = 12, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start. 3) Guest VCPU = 8, set cpu="4-7" in HVM, first IO shows up 90 seconds after VM start. 4) Guest VCPU = 8, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start. 5) Guest VCPU = 16, set cpu="0-15" in HVM, first IO shows up 23 seconds after VM start. Previous I mentioned that we give first 4 Physical CPU *only* to dom0, it looks like, for larger VCPU guest, say 16, I shall not limit its PCPU to 12, but give all available PCPU to them.(just like test 5)> Thanks > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel