thr3ads.net - Xen devel - [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Wei Huang

2011-Apr-14 20:37 UTC

[Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

The following patches support AMD lightweight profiling.

Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code to 
handle lazy and unlazy FPU states differently. Lazy FPU state (such as 
SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, 
is saved and restored on each vcpu context switch. To simplify the code, 
we also add a mask option to xsave/xrstor function.

Thanks,
-Wei



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Apr-14 21:09 UTC

head link

Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

On 14/04/2011 21:37, "Wei Huang" <wei.huang2@amd.com> wrote:
> The following patches support AMD lightweight profiling.
> 
> Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU code
to
> handle lazy and unlazy FPU states differently. Lazy FPU state (such as
> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
> is saved and restored on each vcpu context switch. To simplify the code,
> we also add a mask option to xsave/xrstor function.
How much cost is added to context switch paths in the (overwhelmingly
likely) case that LWP is not being used by the guest? Is this adding a whole
lot of unconditional overhead for a feature that noone uses?

 -- Keir
> Thanks,
> -Wei
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wei Huang

2011-Apr-14 22:57 UTC

head link

Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Hi Keir,

I ran a quick test to calculate the overhead of __fpu_unlazy_save() and 
__fpu_unlazy_restore(), which are used to save/restore LWP state. Here 
are the results:

(1) tsc_total: total time used for context_switch() in x86/domain.c
(2) tsc_unlazy: total time used for __fpu_unlazy_save() + 
__fpu_unlazy_retore()

One example:
(XEN) tsc_unlazy=0x00000000008ae174
(XEN) tsc_total=0x00000001028b4907

So the overhead is about 0.2% of total time used by context_switch(). Of 
course, this is just one example. I would say the overhead ratio would 
be <1% for most cases.

Thanks,
-Wei



On 04/14/2011 04:09 PM, Keir Fraser wrote:> On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> 
wrote:
>
>> The following patches support AMD lightweight profiling.
>>
>> Because LWP isn''t tracked by CR0.TS bit, we clean up the FPU
code to
>> handle lazy and unlazy FPU states differently. Lazy FPU state (such as
>> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
>> is saved and restored on each vcpu context switch. To simplify the
code,
>> we also add a mask option to xsave/xrstor function.
> How much cost is added to context switch paths in the (overwhelmingly
> likely) case that LWP is not being used by the guest? Is this adding a
whole
> lot of unconditional overhead for a feature that noone uses?
>
>   -- Keir
>
>> Thanks,
>> -Wei
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Apr-15 20:16 UTC

head link

RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Wait... a context switch takes over 4 billion cycles?
Not likely!

And please check your division.  I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.
> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@amd.com]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
> 
> Hi Keir,
> 
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
> 
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
> 
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
> 
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
> 
> Thanks,
> -Wei
> 
> 
> 
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> 
wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn''t tracked by CR0.TS bit, we clean up the
FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state
(such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> >   -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> >
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Huang2, Wei

2011-Apr-15 20:23 UTC

head link

RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Hi Dan,

This isn''t the cycles of a single switch. This is the total cycle count
(added) over a period. I randomly dumped the numbers when a guest was running.

Thanks,
-Wei

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Dan Magenheimer
Sent: Friday, April 15, 2011 3:16 PM
To: Huang2, Wei; Keir Fraser
Cc: xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Wait... a context switch takes over 4 billion cycles?
Not likely!

And please check your division.  I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.
> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@amd.com]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
> 
> Hi Keir,
> 
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
> 
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
> 
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
> 
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
> 
> Thanks,
> -Wei
> 
> 
> 
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> 
wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn''t tracked by CR0.TS bit, we clean up the
FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state
(such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> >   -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> >
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2011 - [PATCH][RFC] FPU LWP 0/5: patch description

[Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description