Santos, Jose Renato G (Jose Renato Santos)
2005-Apr-23  02:27 UTC
RE: [Xen-devel] Proposal for Xen support of performance monitoringanddebug hardware
William, Please, see my comments embedded in the text below.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > William Cohen > Sent: Friday, April 22, 2005 2:03 PM > To: Ian Pratt > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Proposal for Xen support of > performance monitoringanddebug hardware > > > Ian Pratt wrote: > > > > > >>I have been working on a proposal to add Xen support for > >>performance monitoring and debugging hardware. The goal of > >>this would be enable OProfile, perfmon, and perfctr to work > >>on Xen. The proposal is still pretty preliminary, but I would > >>like comments on the current version. > > > > > > William, have you seen the patches from Jose Renato Santos > for multi > > VM oprofile support? We''re planning on getting these > checked in to the > > xen repo, after a little reworking. > > > > It''s somewhat orthogonal to your msr protection scheme, but > you should > > be aware of it. > > Rik van Riel pointed me at the Santos''s patch for oprofile support. > There are some differences between the two approaches. The > Xen oprofile > support by HP pretty much just supports oprofile and was > designed to get > some information about what was going on in the Xen hypervisor. It > doesn''t provide access to the other performance monitoring (or > debugging) hardware. >I agree. It would be useful to give domains low level access to the MSRs, for supporting a larger set of tools.> > I can certainly see some merit in having fine grained > access control > > over MSRs, though for the case of perf counter registers I wander > > whether we''d be better off with some higher-level interface? > > I was aiming for minimal support low-level, trying to follow the > existing Xen approach of not coding too much knowledge about > the system > in Xen. Make the MSR registers visible and make sure that a guest OS > cannot clobber other guest OSs. The guests OS decide how to use the > performance monitoring hw. The hypervisor needs a list of which > registers are in which class, but the hypervisor doesn''t need to know > the details of what the registers do. > > There is significant variations in the precise events and > contraints on > the combinations of events allowed in many of the performance > monitoring > systems. OProfile has files for each architecture to map > events to the > counter setup. There are a lot of variations in the events > available on > a processor; OProfile doesn''t hide those differences. The > University of > Tennessee knoxville PAPI has abstraction to hide some of these > differences with generic events, e.g. cache miss event. > ppc64 (aix) and > ia64 (perfmon) have libraries to do the complicated > constraints testing > to determine whether events can be done at the same time. > However, these > mapping operations are handled in user-space, not in the kernel. > > I am not sure that that should be pushed into the hypervisor. > I suspect > that someone will complain that the high-level interface > doesn''t handle > some particular instrumentation mode of the performance monitoring > hardware. Adding it to Xen will require rebuilding xen and > the guest OS > and rebooting the entire machime. The low-level interface makes the > guest OS responsible and only it would need to be recompiled, > and only > rebuild and reboot the guest OS. >I agree with your point. Providing low level access to MSR seems the right approach, if you want to provide support for other tools besides OProfile. I also agree that it would be too complex to provide a high level abstraction of performance events across different architectures in the hypervisor.> > What other msr''s do you anticipate your scheme being used > to provide > > restricted access to for selected VMs? > > The sampling used by OProfile would naturally be something > high on the > list of things to use. It would also be nice to be able to do the > stopwatch counting provided by perfctr and perfmon. > > The PPC64, IA64 and Pentium 4 they have precise event > sampling. I would > like to be able access those through the hypervisor. > > -Will >I agree with Ian comments in his reply to this same email. While Xenoprof is useful for providing system wide profiling, I can see it would be usefull to have virtualization of MSR''s and enable domains to have individual hardware performance monitoring capabilities. We were also thinking on these lines and planning to extend xenoprof to have MSR virtualization. I did not understand how your global scope for MSR access would work. It seems you were planning to provide system wide profiling with this. (Please, clarify if this is not the case). I see the folowing problems with this approach if I understood it correctly (from an Oprofile point of view): 1) It would not be possible to profile hypervisor code, since interrupts caused by hardware overflow would be handled by the domain. When the domain start executing the information about what Xen code was running at the time of MSR overflow is lost. In Xenoprof we handle the MSR interrupts inside the hypervisor and save the PC value at that time, enabling the profile of hypervisor code. An additional complication is the use of normal IRQs instead of NMI. This would prevent performance profiling of some parts of the kernel (including interrupt handlers). 2) It seems you plan to have interrupts that occurs in other domains to be delivered to the owner of the MSR. A potential problem with this approach is that this could cause additional domain context swiching (to schedule the owner domain to handle the interrupt) and this could change your profiling results. In addition, it is not clear how the interrupt handler would get information about the PC sample at the time of MSR overflow. Even if it was possible to receive this information from the hypervisor, we would still need a way to map this PC value to the right process and associated binary file running on the other domain, which seems difficult. I think both system wide profiling and single domain (virtualized) profiling are important and it would be nice to have both. As Ian mentioned we cannot have both at the same time, at least for the same MSR. However, it would be possible to have some registers being virtualized and others being used for system wide profiling, at the same time. It would be nice to have a unified framework that could provide both functionalities and a way to select. Renato> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
William Cohen
2005-Apr-25  15:17 UTC
Re: [Xen-devel] Proposal for Xen support of performance monitoringanddebug hardware
Santos, Thanks for the comments. I will take a closer look at the Xen oprofile support and see what I can incoporate into the proposal. Santos, Jose Renato G (Jose Renato Santos) wrote:> > William, > > Please, see my comments embedded in the text below.[...]> I agree with Ian comments in his reply to this same email. > While Xenoprof is useful for providing system wide profiling, I can > see it would be usefull to have virtualization of MSR''s and enable > domains to have individual hardware performance monitoring > capabilities. > We were also thinking on these lines and planning to > extend xenoprof to have MSR virtualization. > > I did not understand how your global scope for MSR access would work. > It seems you were planning to provide system wide profiling with this. > (Please, clarify if this is not the case). I see the folowing > problems with this approach if I understood it correctly (from > an Oprofile point of view):The system-wide profiling was a relatively new addition to the document and it does need some more thought on how all the pieces work. I was thinking that the xen_msr_allocate function would provide some information on how to route the performance monitoring hardware. Select scope as GLOBAL for domain 0 to reserve the performance monitoring hardware for domain 0. The xen_msr_irq_hander sets the irq for performance monitoring to route all perf irq to the domain that reserved the perf HW.> 1) It would not be possible to profile hypervisor code, since > interrupts > caused by hardware overflow would be handled by the domain. When > the domain start executing the information about what Xen code was > running at the time of MSR overflow is lost. In Xenoprof we > handle the MSR interrupts inside the hypervisor and save > the PC value at that time, enabling the profile of > hypervisor code. An additional complication is the use of normal > IRQs instead of NMI. This would prevent performance profiling > of some parts of the kernel (including interrupt handlers).Shouldn''t it be possible for the hypervisor to send the needed information about address to the irq handler in the domain? From the address it should be possible to determine that it is a sample from the hypervisor. The overhead of moving things from hypervisor to domain might be undesirable. I have some reservations about using NMI in this case. With OProfile it is quite possible to kill the machine by setting a sampling interval to be smaller than the overhead incurred by the interrupt servicing routine. Allowing NMIs would be a way for a domain to crash the entire machine. The NMI do allow better coverage of code.> 2) It seems you plan to have interrupts that occurs in other > domains to be delivered to the owner of the MSR. A potential > problem with this approach is that this could cause additional > domain context swiching (to schedule the owner domain to > handle the interrupt) and this could change your profiling > results. In addition, it is not clear how the interrupt > handler would get information about the PC sample at the > time of MSR overflow. Even if it was possible to receive this > information from the hypervisor, we would still need a way > to map this PC value to the right process and associated > binary file running on the other domain, which seems difficult.PC values are pretty transient. Memory maps go away. The mapping the pc values to something reasonable is still an issue; there is a FIXME in the document for this. OProfile has some help in the kernel to convert the raw pc value to a dcookie and file offset. This help is not available to outside the domain.> I think both system wide profiling and single domain (virtualized) > profiling are important and it would be nice to have both. > As Ian mentioned we cannot have both at the same time, > at least for the same MSR. However, it would be possible to have > some registers being virtualized and others being used > for system wide profiling, at the same time. > It would be nice to have a unified framework that could provide > both functionalities and a way to select. > > RenatoSlicing and dicing the performance monitoring hardware may be possible, but it is a complicated operation. There are lots of constraints about the combinations that are allow and not allowed. Combinations like inter-domain and intra-domain sampling would be difficult because the interrupt would be the same. The allocation software would have to have a picture of all the domain allocations. There are lots of constraints on which registers can be used for what on pentium 4 and ppc64. For the time being the proposal will address both global and virtual modes but not allow concurrent use of the global and virtual modes. -Will _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel