I''m looking into getting Xenoprof to tun in an HVM domain, since we will eventually need a profiler for HVM domains to track down areas of poor performance. (HVMs have poor performance? :) ) Being relatively new to OProfile, Xenoprof, and Xen internals, I would appreciate any pointers, tips, and comments on how to work the implementation. I see three basic areas of work. 1. Implement hypercalls in HVM domains. This has been done by Steve Ofsthun of Virtual Iron who contributed his patches to the xen-devel list recently. (Thanks, Steve.) 2. Implement the shared buffer that conveys profile events from the hypervisor to the domain. From my initial crawl through the Xenoprof code (see for example linux-2.6-xen-sparse/arch/i386/oprofile/xenoprof.c) it appears that its setup of the shared buffer via hypercalls and page table updates should work in an HVM domain. Correct me if I''m wrong. 3. Implement an interrupt mechanism for the hypervisor to signal the domain that it has more data in the shared buffer. Xenoprof currently sets up an event channel for this. In my initial hack of the code I discovered that the event channel used by Xenoprof conflicts with the 8259 support in the HVM kernel. Since I use the serial interface to the HVM domain, I am hesitant to remove the 8259 support in my HVM kernel. It appears that I need to either get an event channel to work through qemu, preserving the 8259 functionality, or change Xenoprof in the hypervisor to instruct qemu to issue an interrupt to the domain and change Xenoprof in the domain to run off an interrupt instead of an event channel. Or maybe I don''t know what I''m talking about and need to be enlightened by those who know better what the issues are. Any advice is appreciated. Thanks, Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steve, As we know, xenoprof 2.0 is checked in xen-unstable for days. Now I''m writing a patch to add passive domain support (which exists in UP guest age) for smp guest. Using passive domain and my improvement in oprofile (I sent out before), we needn''t change guest code and can tune hvm guest in a somewhat raw way -- PC samples from hv and guest kernel can be mapped to functions but all samples from application are only there as a whole. It''s enough for tuning hvm performace for now. You are proposing to add active domain support for hvm, right? The apparent advantage is to tune hvm linux applications in future. It should be doable, like we''ve enabled vbd/vnif in hvm. And it needs more effort in hvm. Yes, Hypervcall, shared buffer and notification mechanism are all needed. For the last one, one of simple way is: 1) in hvm guest kernel, register for an unused IRQ line 2) in hv, inject that interrupt to hvm when needed. Or you can use a psydo PCI device in qemu to hold one irq number. Thanks, -Xiaowei -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Steve Dobbelstein Sent: 2006年4月21日 13:11 To: xen-devel@lists.xensource.com Subject: [Xen-devel] Xenoprof in an HVM domain I''m looking into getting Xenoprof to tun in an HVM domain, since we will eventually need a profiler for HVM domains to track down areas of poor performance. (HVMs have poor performance? :) ) Being relatively new to OProfile, Xenoprof, and Xen internals, I would appreciate any pointers, tips, and comments on how to work the implementation. I see three basic areas of work. 1. Implement hypercalls in HVM domains. This has been done by Steve Ofsthun of Virtual Iron who contributed his patches to the xen-devel list recently. (Thanks, Steve.) 2. Implement the shared buffer that conveys profile events from the hypervisor to the domain. From my initial crawl through the Xenoprof code (see for example linux-2.6-xen-sparse/arch/i386/oprofile/xenoprof.c) it appears that its setup of the shared buffer via hypercalls and page table updates should work in an HVM domain. Correct me if I''m wrong. 3. Implement an interrupt mechanism for the hypervisor to signal the domain that it has more data in the shared buffer. Xenoprof currently sets up an event channel for this. In my initial hack of the code I discovered that the event channel used by Xenoprof conflicts with the 8259 support in the HVM kernel. Since I use the serial interface to the HVM domain, I am hesitant to remove the 8259 support in my HVM kernel. It appears that I need to either get an event channel to work through qemu, preserving the 8259 functionality, or change Xenoprof in the hypervisor to instruct qemu to issue an interrupt to the domain and change Xenoprof in the domain to run off an interrupt instead of an event channel. Or maybe I don''t know what I''m talking about and need to be enlightened by those who know better what the issues are. Any advice is appreciated. Thanks, Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
"Yang, Xiaowei" <xiaowei.yang@intel.com> wrote on 04/24/2006 02:02:41 AM:> Hi Steve, > As we know, xenoprof 2.0 is checked in xen-unstable for days.Yes. That''s great. Makes building Xenoprof support much easier.> Now > I''m writing a patch to add passive domain support (which exists in > UP guest age) for smp guest.Are you saying there is support for passive domains in Xenoprof? If so, I don''t see it in xen-unstable. (I just pulled changeset 9734.) It''s mentioned in the docs, and some of it appeared in the 0.8.2 patch to the oprofile utilities, but I don''t see any support for passive domains in the Xen hypervisor, the Linux sparse tree, nor the oprofile-0.9.1-xen.patch to the oprofile utilities. From what I can see, any events that are not going to an active domain are dropped. Or am I missing something?> Using passive domain and my improvement > in oprofile (I sent out before), we needn''t change guest code and > can tune hvm guest in a somewhat raw way -- PC samples from hv and > guest kernel can be mapped to functions but all samples from > application are only there as a whole. It''s enough for tuning hvm > performace for now.If handling passive domains includes resolving symbols in the kernel running in the HVM guest, then that would be sufficient for now. I am looking into performance issues with HVM domains -- networking, disk, memory. Much of that will be kernel code.> You are proposing to add active domain support for hvm, right?Yes. For a complete profile we would want to include user space applications.> The > apparent advantage is to tune hvm linux applications in future. It > should be doable, like we''ve enabled vbd/vnif in hvm. And it needs > more effort in hvm. > Yes, Hypervcall, shared buffer and notification mechanism are all > needed. For the last one, one of simple way is: 1) in hvm guest > kernel, register for an unused IRQ line 2) in hv, inject that > interrupt to hvm when needed. Or you can use a psydo PCI device in > qemu to hold one irq number.Thanks for the tips. Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> Are you saying there is support for passive domains in >> Xenoprof? If so, I don''t see it in xen-unstable. (I just >> pulled changeset 9734.)No, support for passive domain is not available yet. Xiaowei, is working on getting passive domain support with the capability of decoding Xen and kernel samples into function names (but not for user level samples). Xiaowei, could provide more details of when he expects this to be available for others to use Renato _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > No, support for passive domain is not available yet. > Xiaowei, is working on getting passive domain support with > the capability of decoding Xen and kernel samples into > function names (but not for user level samples). > Xiaowei, could provide more details of when he expects this > to be available for others to use >Hi Renato, For now passive domain is enabled and I''ve used it and my enhancement to oprofile to profile VMX doamin. This is the example of how to use it: opcontrol --start-daemon --passive-domains=2 --event=GLOBAL_POWER_EVENTS:10000000:1:1:1 --vmlinux=/boot/vmlinux-syms-2.6.16-xen0 --xen=/boot/xen-syms-3.0-unstable --passive-images=/boot/vmlinux-2.6.9 2 more options are added: --passive-domains is to point out a list of passive domain IDs. --passive-images is for mapping samples to passive domain kernel functions. Let me show you what it can do. Below is a result example collected in dom0 while vmx is running sysbench --test=thread: #opreport samples| %| ------------------ 1322 62.6837 pvmlinux2-syms 698 33.0963 papps2-syms 54 2.5605 pxen2-syms 18 0.8535 vmlinux-syms-2.6.16-xen0 8 0.3793 libc-2.3.4.so 7 0.3319 xen-syms-3.0-unstable 1 0.0474 bash 1 0.0474 oprofiled Here 3 entries need more notice: - pvmlinux?-syms entry means samples for passive domain kernel, while ? stand for domain_id (it''s 2 here); - papps?-syms entry means samples of passive domain application. - pxen?-syms entry means samples of xen when current points to passive domain ? This is the function level mapping, which clearly reflects what vmx domain is doing: #opreport -l samples % app name symbol name 698 33.0963 papps2-syms (no symbols) 399 18.9189 pvmlinux2-syms sched_clock 330 15.6472 pvmlinux2-syms sysenter_past_esp 263 12.4704 pvmlinux2-syms schedule 131 6.2115 pvmlinux2-syms sysenter_entry 58 2.7501 pvmlinux2-syms enqueue_task 56 2.6553 pvmlinux2-syms dequeue_task 49 2.3234 pvmlinux2-syms sys_sched_yield 16 0.7587 pvmlinux2-syms this_rq_lock 11 0.5216 pxen2-syms vmx_asm_vmexit_handler 10 0.4742 pvmlinux2-syms mark_offset_tsc 6 0.2845 pvmlinux2-syms mask_and_ack_8259A 5 0.2371 pxen2-syms vmx_vmexit_handler 4 0.1897 pxen2-syms __vmwrite 3 0.1422 pxen2-syms hvm_io_assist 3 0.1422 pxen2-syms vmx_intr_assist 3 0.1422 pxen2-syms vmx_io_instruction ... So it''s mainly usable, but there is still an issue doing resource cleanup. After it''s fixed, I''ll send the patch out. Thanks, -xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Great, Xiaowei! This is good news. I look forward for your patches Thanks Renato>> -----Original Message----- >> From: Yang, Xiaowei [mailto:xiaowei.yang@intel.com] >> Sent: Wednesday, April 26, 2006 4:15 AM >> To: Santos, Jose Renato G; xen-devel@lists.xensource.com >> Subject: RE: Xenoprof in an HVM domain >> >> > >> > No, support for passive domain is not available yet. >> > Xiaowei, is working on getting passive domain support with the >> > capability of decoding Xen and kernel samples into function names >> > (but not for user level samples). >> > Xiaowei, could provide more details of when he expects >> this to be >> > available for others to use >> > >> >> Hi Renato, >> For now passive domain is enabled and I''ve used it and my >> enhancement to oprofile to profile VMX doamin. >> >> This is the example of how to use it: >> opcontrol --start-daemon --passive-domains=2 >> --event=GLOBAL_POWER_EVENTS:10000000:1:1:1 >> --vmlinux=/boot/vmlinux-syms-2.6.16-xen0 >> --xen=/boot/xen-syms-3.0-unstable >> --passive-images=/boot/vmlinux-2.6.9 >> >> 2 more options are added: >> --passive-domains is to point out a list of passive domain IDs. >> --passive-images is for mapping samples to passive domain >> kernel functions. >> >> Let me show you what it can do. Below is a result example >> collected in dom0 while vmx is running sysbench --test=thread: >> #opreport >> samples| %| >> ------------------ >> 1322 62.6837 pvmlinux2-syms >> 698 33.0963 papps2-syms >> 54 2.5605 pxen2-syms >> 18 0.8535 vmlinux-syms-2.6.16-xen0 >> 8 0.3793 libc-2.3.4.so >> 7 0.3319 xen-syms-3.0-unstable >> 1 0.0474 bash >> 1 0.0474 oprofiled >> >> Here 3 entries need more notice: >> - pvmlinux?-syms entry means samples for passive domain >> kernel, while ? stand for domain_id (it''s 2 here); >> - papps?-syms entry means samples of passive domain application. >> - pxen?-syms entry means samples of xen when current points >> to passive domain ? >> >> This is the function level mapping, which clearly reflects >> what vmx domain is doing: >> #opreport -l >> samples % app name symbol name >> 698 33.0963 papps2-syms (no symbols) >> 399 18.9189 pvmlinux2-syms sched_clock >> 330 15.6472 pvmlinux2-syms sysenter_past_esp >> 263 12.4704 pvmlinux2-syms schedule >> 131 6.2115 pvmlinux2-syms sysenter_entry >> 58 2.7501 pvmlinux2-syms enqueue_task >> 56 2.6553 pvmlinux2-syms dequeue_task >> 49 2.3234 pvmlinux2-syms sys_sched_yield >> 16 0.7587 pvmlinux2-syms this_rq_lock >> 11 0.5216 pxen2-syms vmx_asm_vmexit_handler >> 10 0.4742 pvmlinux2-syms mark_offset_tsc >> 6 0.2845 pvmlinux2-syms mask_and_ack_8259A >> 5 0.2371 pxen2-syms vmx_vmexit_handler >> 4 0.1897 pxen2-syms __vmwrite >> 3 0.1422 pxen2-syms hvm_io_assist >> 3 0.1422 pxen2-syms vmx_intr_assist >> 3 0.1422 pxen2-syms vmx_io_instruction >> ... >> >> So it''s mainly usable, but there is still an issue doing >> resource cleanup. After it''s fixed, I''ll send the patch out. >> >> Thanks, >> -xiaowei >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Friday 21 April 2006 15:11, Steve Dobbelstein wrote:> I''m looking into getting Xenoprof to tun in an HVM domain, since we will > eventually need a profiler for HVM domains to track down areas of poor > performance. (HVMs have poor performance? :) ) Being relatively new to > OProfile, Xenoprof, and Xen internals, I would appreciate any pointers, > tips, and comments on how to work the implementation. I see three basic > areas of work. > > 1. Implement hypercalls in HVM domains. This has been done by Steve > Ofsthun of Virtual Iron who contributed his patches to the xen-devel list > recently. (Thanks, Steve.) > > 2. Implement the shared buffer that conveys profile events from the > hypervisor to the domain. From my initial crawl through the Xenoprof code > (see for example linux-2.6-xen-sparse/arch/i386/oprofile/xenoprof.c) it > appears that its setup of the shared buffer via hypercalls and page table > updates should work in an HVM domain. Correct me if I''m wrong. > > 3. Implement an interrupt mechanism for the hypervisor to signal the domain > that it has more data in the shared buffer. Xenoprof currently sets up an > event channel for this. In my initial hack of the code I discovered that > the event channel used by Xenoprof conflicts with the 8259 support in the > HVM kernel. Since I use the serial interface to the HVM domain, I am > hesitant to remove the 8259 support in my HVM kernel. It appears that I > need to either get an event channel to work through qemu, preserving the > 8259 functionality, or change Xenoprof in the hypervisor to instruct qemu > to issue an interrupt to the domain and change Xenoprof in the domain to > run off an interrupt instead of an event channel. Or maybe I don''t know > what I''m talking about and need to be enlightened by those who know better > what the issues are. > > Any advice is appreciated. > > Thanks, > Steve D. >I guess my question on this approach is whether or not it would still apply to an unmodified guest. Were you thinking of putting all of this interface into a loadable kernel module for Linux (e. g. into the oprofile module)? In some sense, I would like it better if I could just run an unmodified oprofile in an HVM domain (after all, it''s supposed to be fully virtualized, right?). But I could live with a situation where I run a modified oprofile under an unmodified HVM guest, with the exception that that guest allows one to load an HVM capable version of the oprofile driver. [Unless the modified oprofile and module also work in a native environment, keeping track of these differences could be a headache for production environments that are sometimes run virtual and sometimes run native, though.] This assumes, of course, that one can figure out how to virtualize the performance counters, and then the hypervisor would have to sort out conflicts, etc, between domains that wanted to use the performance counters (perhaps these would be a resource the hypervisor could dynamically allocate to a domain, by, for example, some kind of "xm" command). Or, I suppose xenoprof could be extended to allow the HVM domain to be a "controlling" domain in the xenoprof sense. Comments, flames, discussion etc? -- Ray Bryant AMD Performance Labs Austin, Tx 512-602-0038 (o) 512-507-7807 (c) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >> Ray Bryant >> Sent: Thursday, May 25, 2006 7:21 AM >> To: xen-devel@lists.xensource.com >> Cc: Steve Dobbelstein; Eranian, Stephane >> Subject: Re: [Xen-devel] Xenoprof in an HVM domain... <stuff deleted> ....>> This assumes, of course, that one can figure out >> how to virtualize the performance counters, and then the >> hypervisor would have to sort out conflicts, etc, between >> domains that wanted to use the performance counters (perhaps >> these would be a resource the hypervisor could dynamically >> allocate to a domain, by, for >> example, some kind of "xm" command).I don''t think this is how performance counters should be virtualized. Virtualizing performance counter should save/restore the values of active perf counters on every VCPU/domain context switch. There should be no need for such a "xm" command. Performance counter virtualization is not currently supported in Xen, although it would be nice to have it. With counter virtualization, guest domains would be able to profile themselves with un-modified oprofile. This would be usefull to enable users to profile their applications on Xen guests in the same way they are used to do on vanila linux. The current model supported by Xenoprof is system-wide profiling, where counters are used to profile the collection of domains and Xen together. This is usefull for Xen developers to optimize Xen and para-virtualized kernels running on Xen. Ideally we would like to have support for both system-wide profiling (for Xen developers) and independent guest profiling with perf counter virtualization (for Xen users). Adding perf counter virtualization is in our to do list. If anybody is interested in working on this please let me know. We would appreciate any help we could get. Thanks Renato _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday 25 May 2006 11:44, Santos, Jose Renato G wrote:> > I don''t think this is how performance counters should be virtualized. > Virtualizing performance counter should save/restore the values of > active perf counters on every VCPU/domain context switch. There should > be no need for such a "xm" command. >Agreed, but as a practical matter, reading and writing PMC registers like that may be too slow to actually do on every context switch. Read and write PMC can sometimes be quite slow, depending on implementation. Its the same kind of argument that leads to lazy save and restore of the FPU registers. If it is done on every context switch it is simply too slow. Of course, hypervisor context switches >>might<< occur less frequently than process context switch in a native O/S, but thus far I''ve not seen evidence of this. :-) At any rate, if complete virtualization of PMC''s is too slow (data required), then one could treat them as a system resource and allocate them out to the domains as required. That was all I was suggesting.> Performance counter virtualization is not currently supported in Xen, > although it would be nice to have it. With counter virtualization, guest > domains would be able to profile themselves with un-modified oprofile. > This would be usefull to enable users to profile their applications on > Xen guests in the same way they are used to do on vanila linux. >My point, exactly.> The current model supported by Xenoprof is system-wide profiling, where > counters are used to profile the collection of domains and Xen together. > This is usefull for Xen developers to optimize Xen and para-virtualized > kernels running on Xen. >Yes. And it is very helpful in that regard. Don''t get me wrong. In essence I''m really asking how xenoprof would/could/should evolve to better support profiling of HVM domains.> Ideally we would like to have support for both system-wide profiling > (for Xen developers) and independent guest profiling with perf counter > virtualization (for Xen users). Adding perf counter virtualization is in > our to do list. If anybody is interested in working on this please let > me know. > We would appreciate any help we could get. >I''ll put it on my todo list. :-) In the meantime, off to get passive domain support working on my latest xenbits-unstable tree.> Thanks > RenatoThank you, -- Ray Bryant AMD Performance Labs Austin, Tx 512-602-0038 (o) 512-507-7807 (c) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
If you partition the perf counters and do not share them you still need to enable/disable them on context switch. Otherwise you would be counting events that happen when other domains are running. Probably saving and restoring counters should not be much more expensive than disabling/enabling them. Even if this is not true we could still do lazy save/restore similar to what is done with FPU registers. Thus we would only need to save and restore the counters when needed and only the counters being used. If you use counters in only one domain, overhead with full perf counter virtualization would be equivalent to your approach with the advantage of transparency to the guest (no need to ask for resources, etc.). If you use perf counters in multiple domains you may have additional overhead of saving/restoring them, but I think that is more than compensated by a more powerfull abstraction. I think full virtualization should be the first option. Only if overhead proves to be very painfull we should consider an alternative. Not the other way around... Of course, gathering some data on the overhead of saving/restoring counters would help clarify this. Renato>> -----Original Message----- >> From: Ray Bryant [mailto:raybry@mpdtxmail.amd.com] >> Sent: Thursday, May 25, 2006 11:39 AM >> To: Santos, Jose Renato G >> Cc: xen-devel@lists.xensource.com; Steve Dobbelstein; >> Eranian, Stephane >> Subject: Re: [Xen-devel] Xenoprof in an HVM domain >> >> On Thursday 25 May 2006 11:44, Santos, Jose Renato G wrote: >> > >> > I don''t think this is how performance counters should be >> virtualized. >> > Virtualizing performance counter should save/restore the values of >> > active perf counters on every VCPU/domain context switch. >> There should >> > be no need for such a "xm" command. >> > >> >> Agreed, but as a practical matter, reading and writing PMC >> registers like that >> may be too slow to actually do on every context switch. >> Read and write PMC >> can sometimes be quite slow, depending on implementation. >> >> Its the same kind of argument that leads to lazy save and >> restore of the FPU >> registers. If it is done on every context switch it is >> simply too slow. >> >> Of course, hypervisor context switches >>might<< occur less >> frequently than process context switch in a native O/S, but >> thus far I''ve not seen evidence of this. :-) >> >> At any rate, if complete virtualization of PMC''s is too slow >> (data required), then one could treat them as a system >> resource and allocate them out to the >> domains as required. That was all I was suggesting. >> >> > Performance counter virtualization is not currently >> supported in Xen, >> > although it would be nice to have it. With counter virtualization, >> > guest domains would be able to profile themselves with >> un-modified oprofile. >> > This would be usefull to enable users to profile their >> applications on >> > Xen guests in the same way they are used to do on vanila linux. >> > >> >> My point, exactly. >> >> > The current model supported by Xenoprof is system-wide profiling, >> > where counters are used to profile the collection of >> domains and Xen together. >> > This is usefull for Xen developers to optimize Xen and >> > para-virtualized kernels running on Xen. >> > >> >> Yes. And it is very helpful in that regard. Don''t get me >> wrong. In essence I''m really asking how xenoprof >> would/could/should evolve to better support profiling of HVM domains. >> >> > Ideally we would like to have support for both system-wide >> profiling >> > (for Xen developers) and independent guest profiling with >> perf counter >> > virtualization (for Xen users). Adding perf counter >> virtualization is >> > in our to do list. If anybody is interested in working on >> this please >> > let me know. >> > We would appreciate any help we could get. >> > >> >> I''ll put it on my todo list. :-) >> >> In the meantime, off to get passive domain support working >> on my latest xenbits-unstable tree. >> >> > Thanks >> > Renato >> >> Thank you, >> -- >> Ray Bryant >> AMD Performance Labs Austin, Tx >> 512-602-0038 (o) 512-507-7807 (c) >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday 25 May 2006 16:44, Santos, Jose Renato G wrote:> If you partition the perf counters and do not share them you still need > to enable/disable them on context switch. Otherwise you would be > counting events that happen when other domains are running. Probably > saving and restoring counters should not be much more expensive than > disabling/enabling them.Good point. I don''t know what the trade off is there. One other issue that worries me here is counter interference -- e. g. suppose we are measuring cache misses. Then if another domain has been scheduled in the meantime it can wash the measured domain''s data out of the cache and cause a lot of unexpected misses in the measured domain. Offhand, I don''t know of a good way to fix this other than to pin the measured domain to its cpu''s for the duration of the measurement experiment. Of course, there''s still the pesky issue of interrupt service routines running in the hypervisor and causing cache damage there as well, or of dom0 getting invovled to do some I/O for the measured domain.> Even if this is not true we could still do lazy save/restore similar to > what is done with FPU registers. Thus we would only need to save and > restore the counters when needed and only the counters being used. If > you use counters in only one domain, overhead with full perf counter > virtualization would be equivalent to your approach with the advantage > of transparency to the guest (no need to ask for resources, etc.). If > you use perf counters in multiple domains you may have additional > overhead of saving/restoring them, but I think that is more than > compensated by a more powerfull abstraction. >Yes.> I think full virtualization should be the first option. Only if overhead > proves to be very painfull we should consider an alternative. Not the > other way around...I agree.> Of course, gathering some data on the overhead of saving/restoring > counters would help clarify this. >Sounds like some work TBD here. If there were only more time in the day. Best Regards, -- Ray Bryant AMD Performance Labs Austin, Tx 512-602-0038 (o) 512-507-7807 (c) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> -----Original Message----- >> From: Ray Bryant [mailto:raybry@mpdtxmail.amd.com] >> Sent: Thursday, May 25, 2006 3:02 PM >> To: xen-devel@lists.xensource.com >> Cc: Santos, Jose Renato G; Eranian, Stephane; Steve Dobbelstein >> Subject: Re: [Xen-devel] Xenoprof in an HVM domain >> >> On Thursday 25 May 2006 16:44, Santos, Jose Renato G wrote: >> > If you partition the perf counters and do not share them you still >> > need to enable/disable them on context switch. Otherwise >> you would be >> > counting events that happen when other domains are >> running. Probably >> > saving and restoring counters should not be much more >> expensive than >> > disabling/enabling them. >> >> Good point. I don''t know what the trade off is there. >> >> One other issue that worries me here is counter interference >> -- e. g. suppose >> we are measuring cache misses. Then if another domain has >> been scheduled in >> the meantime it can wash the measured domain''s data out of >> the cache and >> cause a lot of unexpected misses in the measured domain. >> Offhand, I don''t >> know of a good way to fix this other than to pin the >> measured domain to its >> cpu''s for the duration of the measurement experiment. Of >> course, there''s >> still the pesky issue of interrupt service routines running >> in the hypervisor and causing cache damage there as well, or >> of dom0 getting invovled to do some I/O for the measured domain.These are all fair concerns. However I wouldn''t consider this a perf counter issue, but a virtualization issue. Counters are just counting what is happening. This is really an issue of sharing hardware resources (cache, TLB, buses, etc) across virtual machines. Whenever resources are shared there is potential for interference. The interference will be there regardeless if you measure it or not. Perf counters can be used to measure it and help you define resource policies, such as for example dedicating a CPU to a domain, as you mention. If your application is suffering from interference from other domains you would like to know that, when you are profiling. Renato _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hello, On Thu, May 25, 2006 at 09:44:15AM -0700, Santos, Jose Renato G wrote:> > > >> -----Original Message----- > >> From: xen-devel-bounces@lists.xensource.com > >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > >> Ray Bryant > >> Sent: Thursday, May 25, 2006 7:21 AM > >> To: xen-devel@lists.xensource.com > >> Cc: Steve Dobbelstein; Eranian, Stephane > >> Subject: Re: [Xen-devel] Xenoprof in an HVM domain > ... <stuff deleted> .... > >> This assumes, of course, that one can figure out > >> how to virtualize the performance counters, and then the > >> hypervisor would have to sort out conflicts, etc, between > >> domains that wanted to use the performance counters (perhaps > >> these would be a resource the hypervisor could dynamically > >> allocate to a domain, by, for > >> example, some kind of "xm" command). > > I don''t think this is how performance counters should be virtualized. > Virtualizing performance counter should save/restore the values of > active perf counters on every VCPU/domain context switch. There should > be no need for such a "xm" command. > > Performance counter virtualization is not currently supported in Xen, > although it would be nice to have it. With counter virtualization, guest > domains would be able to profile themselves with un-modified oprofile. > This would be usefull to enable users to profile their applications on > Xen guests in the same way they are used to do on vanila linux. >I think we need to clearly identify and prioritize the needs. The first thing to do is to ensure that guests OSes using the PMU when running native can continue to do so when running virtualized. That holds true for both para-virtualized and fully virtualized (Pacifica/VT) guests. This is the highest priority because some OSes do rely on performance counters. Without such support, they cannot provide the same kernel level API to their applications. In other words, certain applications will fail. The second need is what XenOprofile is addressing which is how to get a "global view" of what is going in the the guests and in the VMM. To me this is a lower priority need because the system can function without it. Yet I recognize it is important for tuning the VMM. Those two distinct needs are not specific to Xen, in fact, they are exact replicas on what you need to provide in a native OS. The perfmon2 subsystem does this. The global view is equivalent to "system-wide" monitoring and the per-guest virtualized PMU is equivalent to the per-thread mode. To support per-guest monitoring, the PMU must be virtualized. The counters must be saved/restored on domain switch. A similar operation is done on thread-switch in the Linux kernel for perfmon2. In general performance counters are quite expensive to read, ranging from 35 cycles on Itanium2 to thousands of cycles on some IA-32 processors. As indicated by Ray/Renato, you can be smart about that. In perfmon2 we do lazy save/restore of performance counters. This has worked fairly well. I would expect domain switches to happen less frequently that thread switches, anyway. Furthermore, many measurements do use only a limited number of PMU registers. Another important point is that I do not think that per-guest measurements should include VMM-level execution, unlike for a system-wide measurement. That is true for both para-virtualized and fully virtualized (VT/Pacifica) guests. This is important for sampling. I am not sure tools would know what to do with samples they cannot attribute to code they know about. Furthermore, the goal of virtualization is to HIDE to guests applications the fact that they run virtualized. What would we make an exception for monitoring tools? Note that this implies that the VMM must turn off/on monitoring upon entry/exit. For system-wide monitoring, you do need visibility into the VMM. Yet monitoring is driven from a guest domain, most likely domain0. On counter overflow, the VMM receives the PMU interrupt and the corresponding interrupted IP (IIP). That information must somehow be conveyed to the monitoring tool. It is not possible to simply pass the interrupt to domain0 (controlling domain for the monitoring session). To solve this problem, XenOprofile uses a in-VMM buffer where the "samples" are first saved. Then there needs to be a communication channel with controlling domain to send notification when the buffer becomes full. There needs to be one such buffer per virtual-CPU. Those buffers only need to be visible to domain0. The whole mechanism should NOT require any special code in the guest domains, except for domain0. That way it would work with para-virtualized and fully virtualized guests be they Linux, Windows or something else. In XenOprofile, I understand the buffer is shared via remapping. I think the interface to setup/control the buffer needs to be more generic. For instance, certain measurements may need to record in the buffer more than just the IIP. they may need to also save certain counters values. The controlling domain needs some interface to express what needs to be recorded in each sample. Furthermore, it also needs to know how to resume after an overflow, i.e., what sampling period to reload in the overflowed counter. All this information must be passed to the VMM because there is not intervention from the controlling domain until the buffer fills up. Once again, this is not something new. We have the equivalent mechanism in perfmon2 simply because we support an in-kernel sampling buffer. The next step is to see how the PMU can be shared between a system-wide usage and a per-guest usage. On some PMU models, this may not be possible due to hardware limitations, i.e., fully independence of the counters. This gets into a new level of complexity which has to be managed by the VMM. Basically, this requires a VMM PMU register allocator per virtual-CPU. This also implies that consumers cannot expect to systematically have access to the full PMU each time they ask for it. Note that it may be acceptable for the time being to say that system-wide and per-guest are mutually exclusive. Hope this helps.> The current model supported by Xenoprof is system-wide profiling, where > counters are used to profile the collection of domains and Xen together. > This is usefull for Xen developers to optimize Xen and para-virtualized > kernels running on Xen. > > Ideally we would like to have support for both system-wide profiling > (for Xen developers) and independent guest profiling with perf counter > virtualization (for Xen users). Adding perf counter virtualization is in > our to do list. If anybody is interested in working on this please let > me know. > We would appreciate any help we could get. > > Thanks > Renato-- -Stephane _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan, Thanks for your comments. Please see my comments embedded in the text below,>> Another important point is that I do not think that >> per-guest measurements should include VMM-level execution, >> unlike for a system-wide measurement. That is true for both >> para-virtualized and fully virtualized (VT/Pacifica) guests. >> This is important for sampling. I am not sure tools would >> know what to do with samples they cannot attribute to code >> they know about. Furthermore, the goal of virtualization is >> to HIDE to guests applications the fact that they run >> virtualized. What would we make an exception for monitoring >> tools? Note that this implies that the VMM must turn off/on >> monitoring upon entry/exit.I am worried that this would cause a high performance hit. Also, I am not sure that this is the right thing to do. After all when Xen is running on behalf of the domain, it is using CPU clocks, causing cache misses etc...(and all this on behalf of the domain). Also note that if counters are virtualized and are programmed to generate a "virtual interrupt" on overflow, the virtual interrupt handler in the guest is the one which should sample the PC, not Xen. Therefore the domain will not see any sample associated with Xen code. For overflows inside Xen, it will look like for the guest as the overflow happened on the instruction that caused the entry into hypervisor. I particularly think we should avoid stopping/starting counters on entry/exit, but I am open to be convinced otherwise. I am curious about the opnion of others on this issue, specially the core Xen team.>> >> For system-wide monitoring, you do need visibility into the >> VMM. Yet monitoring is driven from a guest domain, most >> likely domain0. On counter overflow, the VMM receives the >> PMU interrupt and the corresponding interrupted IP (IIP). >> That information must somehow be conveyed to the monitoring >> tool. It is not possible to simply pass the interrupt to >> domain0 (controlling domain for the monitoring session). To >> solve this problem, XenOprofile uses a in-VMM buffer where >> the "samples" are first saved. Then there needs to be a >> communication channel with controlling domain to send >> notification when the buffer becomes full. There needs to be >> one such buffer per virtual-CPU. Those buffers only need to >> be visible to domain0. The whole mechanism should NOT >> require any special code in the guest domains, except for >> domain0.That is not quite correct. Indeed you do not need to run any code in the guest if you are interested only in kernel and Xen samples. This the passive domain mode supported by Xenoprof (Not yet available on the public tree, but it will be there soon, thanks to Xiaowei Yang). Dom0 collects and interpret samples for passive domains which does not need to run any profiling code. However if you want to interpret user level samples, you need to interpret these samples on the domain that hosts the code, since that is the only place where the current memory mapping necessary to map a particular PC to the user level symbol is known. Xenoprof enable you to get complete profiling information (including user level code in all domains). For that you need to run oprofile on all domains for which you want user level profiling. This is the active domain mode in Xenoprof. By choosing between passive and active domain modes, the user can trade off complexity of monitoring with detail in measurements.>> fully virtualized guests be they Linux, Windows or something >> else. In XenOprofile, I understand the buffer is shared via >> remapping. I think the interface to setup/control the buffer >> needs to be more generic. For instance, certain measurements >> may need to record in the buffer more than just the IIP. >> they may need to also save certain counters values. The >> controlling domain needs some interface to express what >> needs to be recorded in each sample. Furthermore, it also >> needs to know how to resume after an overflow, i.e., what >> sampling period to reload in the overflowed counter. All >> this information must be passed to the VMM because there is >> not intervention from the controlling domain until the >> buffer fills up. Once again, this is not something new. We >> have the equivalent mechanism in perfmon2 simply because we >> support an in-kernel sampling buffer.I think we first need to capture a list of use case scenarios before we define how the xenoprof interface should be extended. We do not want to create a complex interface if it ends up never being used. If you know specific applications/scenarios that need to be supported , this could help us identify the limitations of the current interface and determine how it should be extended.>> >> The next step is to see how the PMU can be shared between a >> system-wide usage and a per-guest usage. On some PMU models, >> this may not be possible due to hardware limitations, i.e., >> fully independence of the counters. This gets into a new >> level of complexity which has to be managed by the VMM. >> Basically, this requires a VMM PMU register allocator per >> virtual-CPU. This also implies that consumers cannot expect >> to systematically have access to the full PMU each time they >> ask for it. >> Note that it may be acceptable for the time being to say >> that system-wide and per-guest are mutually exclusive. >> >> Hope this helps. >> >> >> > The current model supported by Xenoprof is system-wide profiling, >> > where counters are used to profile the collection of >> domains and Xen together. >> > This is usefull for Xen developers to optimize Xen and >> > para-virtualized kernels running on Xen. >> > >> > Ideally we would like to have support for both system-wide >> profiling >> > (for Xen developers) and independent guest profiling with >> perf counter >> > virtualization (for Xen users). Adding perf counter >> virtualization is >> > in our to do list. If anybody is interested in working on >> this please >> > let me know. >> > We would appreciate any help we could get. >> > >> > Thanks >> > Renato >> >> -- >> >> -Stephane >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel