The following patches support AMD lightweight profiling. Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to handle lazy and unlazy FPU states differently. Lazy FPU state (such as SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, is saved and restored on each vcpu context switch. To simplify the code, we also add a mask option to xsave/xrstor function. Thanks, -Wei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Apr-14 21:09 UTC
Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
On 14/04/2011 21:37, "Wei Huang" <wei.huang2@amd.com> wrote:> The following patches support AMD lightweight profiling. > > Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to > handle lazy and unlazy FPU states differently. Lazy FPU state (such as > SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, > is saved and restored on each vcpu context switch. To simplify the code, > we also add a mask option to xsave/xrstor function.How much cost is added to context switch paths in the (overwhelmingly likely) case that LWP is not being used by the guest? Is this adding a whole lot of unconditional overhead for a feature that noone uses? -- Keir> Thanks, > -Wei > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei Huang
2011-Apr-14 22:57 UTC
Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Hi Keir, I ran a quick test to calculate the overhead of __fpu_unlazy_save() and __fpu_unlazy_restore(), which are used to save/restore LWP state. Here are the results: (1) tsc_total: total time used for context_switch() in x86/domain.c (2) tsc_unlazy: total time used for __fpu_unlazy_save() + __fpu_unlazy_retore() One example: (XEN) tsc_unlazy=0x00000000008ae174 (XEN) tsc_total=0x00000001028b4907 So the overhead is about 0.2% of total time used by context_switch(). Of course, this is just one example. I would say the overhead ratio would be <1% for most cases. Thanks, -Wei On 04/14/2011 04:09 PM, Keir Fraser wrote:> On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > >> The following patches support AMD lightweight profiling. >> >> Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to >> handle lazy and unlazy FPU states differently. Lazy FPU state (such as >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, >> is saved and restored on each vcpu context switch. To simplify the code, >> we also add a mask option to xsave/xrstor function. > How much cost is added to context switch paths in the (overwhelmingly > likely) case that LWP is not being used by the guest? Is this adding a whole > lot of unconditional overhead for a feature that noone uses? > > -- Keir > >> Thanks, >> -Wei >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Apr-15 20:16 UTC
RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Wait... a context switch takes over 4 billion cycles? Not likely! And please check your division. I get the same answer from "dc" only when I use lowercase hex numbers and dc complains about unimplemented chars, else I get 0.033%... also unlikely.> -----Original Message----- > From: Wei Huang [mailto:wei.huang2@amd.com] > Sent: Thursday, April 14, 2011 4:57 PM > To: Keir Fraser > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description > > Hi Keir, > > I ran a quick test to calculate the overhead of __fpu_unlazy_save() and > __fpu_unlazy_restore(), which are used to save/restore LWP state. Here > are the results: > > (1) tsc_total: total time used for context_switch() in x86/domain.c > (2) tsc_unlazy: total time used for __fpu_unlazy_save() + > __fpu_unlazy_retore() > > One example: > (XEN) tsc_unlazy=0x00000000008ae174 > (XEN) tsc_total=0x00000001028b4907 > > So the overhead is about 0.2% of total time used by context_switch(). > Of > course, this is just one example. I would say the overhead ratio would > be <1% for most cases. > > Thanks, > -Wei > > > > On 04/14/2011 04:09 PM, Keir Fraser wrote: > > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > > > >> The following patches support AMD lightweight profiling. > >> > >> Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to > >> handle lazy and unlazy FPU states differently. Lazy FPU state (such > as > >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as > LWP, > >> is saved and restored on each vcpu context switch. To simplify the > code, > >> we also add a mask option to xsave/xrstor function. > > How much cost is added to context switch paths in the (overwhelmingly > > likely) case that LWP is not being used by the guest? Is this adding > a whole > > lot of unconditional overhead for a feature that noone uses? > > > > -- Keir > > > >> Thanks, > >> -Wei > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Huang2, Wei
2011-Apr-15 20:23 UTC
RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Hi Dan, This isn''t the cycles of a single switch. This is the total cycle count (added) over a period. I randomly dumped the numbers when a guest was running. Thanks, -Wei -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Dan Magenheimer Sent: Friday, April 15, 2011 3:16 PM To: Huang2, Wei; Keir Fraser Cc: xen-devel@lists.xensource.com Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description Wait... a context switch takes over 4 billion cycles? Not likely! And please check your division. I get the same answer from "dc" only when I use lowercase hex numbers and dc complains about unimplemented chars, else I get 0.033%... also unlikely.> -----Original Message----- > From: Wei Huang [mailto:wei.huang2@amd.com] > Sent: Thursday, April 14, 2011 4:57 PM > To: Keir Fraser > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description > > Hi Keir, > > I ran a quick test to calculate the overhead of __fpu_unlazy_save() and > __fpu_unlazy_restore(), which are used to save/restore LWP state. Here > are the results: > > (1) tsc_total: total time used for context_switch() in x86/domain.c > (2) tsc_unlazy: total time used for __fpu_unlazy_save() + > __fpu_unlazy_retore() > > One example: > (XEN) tsc_unlazy=0x00000000008ae174 > (XEN) tsc_total=0x00000001028b4907 > > So the overhead is about 0.2% of total time used by context_switch(). > Of > course, this is just one example. I would say the overhead ratio would > be <1% for most cases. > > Thanks, > -Wei > > > > On 04/14/2011 04:09 PM, Keir Fraser wrote: > > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > > > >> The following patches support AMD lightweight profiling. > >> > >> Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to > >> handle lazy and unlazy FPU states differently. Lazy FPU state (such > as > >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as > LWP, > >> is saved and restored on each vcpu context switch. To simplify the > code, > >> we also add a mask option to xsave/xrstor function. > > How much cost is added to context switch paths in the (overwhelmingly > > likely) case that LWP is not being used by the guest? Is this adding > a whole > > lot of unconditional overhead for a feature that noone uses? > > > > -- Keir > > > >> Thanks, > >> -Wei > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel