thr3ads.net - Xen devel - fpu_taskswitch adjustment proposal [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2012-Jun-15 16:03 UTC

fpu_taskswitch adjustment proposal

While pv-ops so far doesn''t care to eliminate the two trap-and-
emulate CR0 accesses from the asm/xor.h save/restore
operations, the legacy x86-64 kernel uses conditional clts()/stts()
for this purpose. While looking into whether to extend this to the
newly added (in 3.5) AVX operations there I realized that this isn''t
fully correct: It doesn''t properly nest inside a kernel_fpu_begin()/
kernel_fpu_end() pair (as it will stts() at the end no matter what
the original state of CR0.TS was).

In order to not introduce completely new hypercalls to overcome
this (fpu_taskswitch isn''t really extensible on its own), I''m
considering to add a new VM assist, altering the fpu_taskswitch
behavior so that it would return an indicator whether any change
to the virtual CR0.TS was done. That way, the kernel can
implement a true save/restore cycle here.

In order to allow the kernel to run on older hypervisors without
extra conditionals (behaving the same way as it does currently,
i.e. with the incorrect nesting), the return value 0 (which the
hypercall currently always returns) would need to be used to
indicate that the bit got actually flipped (such that on an old
hypervisor an updated kernel would always think that
something needs to be restored).

Would that be an acceptable solution? Can someone think of
other ways to deal with this? (And - is pv-ops interested in
eliminating the two CR0 access emulations in what is supposed
to be a fast path?)

Jan

Keir Fraser

2012-Jun-15 17:06 UTC

head link

Re: fpu_taskswitch adjustment proposal

On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com> wrote:
> While pv-ops so far doesn''t care to eliminate the two trap-and-
> emulate CR0 accesses from the asm/xor.h save/restore
> operations, the legacy x86-64 kernel uses conditional clts()/stts()
> for this purpose. While looking into whether to extend this to the
> newly added (in 3.5) AVX operations there I realized that this
isn''t
> fully correct: It doesn''t properly nest inside a
kernel_fpu_begin()/
> kernel_fpu_end() pair (as it will stts() at the end no matter what
> the original state of CR0.TS was).
> 
> In order to not introduce completely new hypercalls to overcome
> this (fpu_taskswitch isn''t really extensible on its own),
I''m
> considering to add a new VM assist, altering the fpu_taskswitch
> behavior so that it would return an indicator whether any change
> to the virtual CR0.TS was done. That way, the kernel can
> implement a true save/restore cycle here.
It should be possible for the guest kernel to track its CR0.TS setting
shouldn''t it? It gets modified only via a few paravirt hooks, and
implicitly
cleared on #NM.

 -- Keir
> In order to allow the kernel to run on older hypervisors without
> extra conditionals (behaving the same way as it does currently,
> i.e. with the incorrect nesting), the return value 0 (which the
> hypercall currently always returns) would need to be used to
> indicate that the bit got actually flipped (such that on an old
> hypervisor an updated kernel would always think that
> something needs to be restored).
> 
> Would that be an acceptable solution? Can someone think of
> other ways to deal with this? (And - is pv-ops interested in
> eliminating the two CR0 access emulations in what is supposed
> to be a fast path?)
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Jan Beulich

2012-Jun-18 07:32 UTC

head link

Re: fpu_taskswitch adjustment proposal

>>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>> While pv-ops so far doesn''t care to eliminate the two
trap-and-
>> emulate CR0 accesses from the asm/xor.h save/restore
>> operations, the legacy x86-64 kernel uses conditional clts()/stts()
>> for this purpose. While looking into whether to extend this to the
>> newly added (in 3.5) AVX operations there I realized that this
isn''t
>> fully correct: It doesn''t properly nest inside a
kernel_fpu_begin()/
>> kernel_fpu_end() pair (as it will stts() at the end no matter what
>> the original state of CR0.TS was).
>> 
>> In order to not introduce completely new hypercalls to overcome
>> this (fpu_taskswitch isn''t really extensible on its own),
I''m
>> considering to add a new VM assist, altering the fpu_taskswitch
>> behavior so that it would return an indicator whether any change
>> to the virtual CR0.TS was done. That way, the kernel can
>> implement a true save/restore cycle here.
> 
> It should be possible for the guest kernel to track its CR0.TS setting
> shouldn''t it? It gets modified only via a few paravirt hooks, and
implicitly
> cleared on #NM.
Sure, but selling this to the Linux maintainers I would expect to be
harder than fitting the Xen side of things into the current save-
and-restore model the native xor code uses. It would only be strait
forward to implement on the legacy, forward ported trees.

However, with the #NM handler in pv-ops apparently not
leveraging the fact that CR0.TS is already cleared for it on entry,
maybe this could indeed be introduced together. Konrad?

Jan

Keir Fraser

2012-Jun-18 12:12 UTC

head link

Re: fpu_taskswitch adjustment proposal

On 18/06/2012 08:32, "Jan Beulich" <JBeulich@suse.com> wrote:
>> It should be possible for the guest kernel to track its CR0.TS setting
>> shouldn''t it? It gets modified only via a few paravirt hooks,
and implicitly
>> cleared on #NM.
> 
> Sure, but selling this to the Linux maintainers I would expect to be
> harder than fitting the Xen side of things into the current save-
> and-restore model the native xor code uses. It would only be strait
> forward to implement on the legacy, forward ported trees.
Wouldn''t it be hidden entirely behind pv_ops hooks and within
Xen-specific
SSE save/restore code? I suppose you''d need to statically allocate the
per-cpu space for tracking the CR0.TS state... But overall it seems it will
be of little/no concern to other kernel maintainers?

 -- Keir
> However, with the #NM handler in pv-ops apparently not
> leveraging the fact that CR0.TS is already cleared for it on entry,
> maybe this could indeed be introduced together. Konrad?

Jan Beulich

2012-Jun-18 12:45 UTC

head link

Re: fpu_taskswitch adjustment proposal

>>> On 18.06.12 at 14:12, Keir Fraser <keir@xen.org> wrote:
> On 18/06/2012 08:32, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>>> It should be possible for the guest kernel to track its CR0.TS
setting
>>> shouldn''t it? It gets modified only via a few paravirt
hooks, and implicitly
>>> cleared on #NM.
>> 
>> Sure, but selling this to the Linux maintainers I would expect to be
>> harder than fitting the Xen side of things into the current save-
>> and-restore model the native xor code uses. It would only be strait
>> forward to implement on the legacy, forward ported trees.
> 
> Wouldn''t it be hidden entirely behind pv_ops hooks and within
Xen-specific
> SSE save/restore code? I suppose you''d need to statically allocate
the
> per-cpu space for tracking the CR0.TS state... But overall it seems it will
> be of little/no concern to other kernel maintainers?
The #NM handler part wouldn''t afaict. Everything else indeed
ought to be restricted to the functions backing paravirt.h''s clts()
and write_cr0().

Jan

Keir Fraser

2012-Jun-18 13:59 UTC

head link

Re: fpu_taskswitch adjustment proposal

On 18/06/2012 13:45, "Jan Beulich" <JBeulich@suse.com> wrote:
>> Wouldn''t it be hidden entirely behind pv_ops hooks and within
Xen-specific
>> SSE save/restore code? I suppose you''d need to statically
allocate the
>> per-cpu space for tracking the CR0.TS state... But overall it seems it
will
>> be of little/no concern to other kernel maintainers?
> 
> The #NM handler part wouldn''t afaict. Everything else indeed
> ought to be restricted to the functions backing paravirt.h''s
clts()
> and write_cr0().
Yes, the #NM handler might need extra treatment. Perhaps we could install
our own #NM wrapper around the kernel''s generic handler, which first
clears
the saved CR0.TS flag. Then on clts() in the generic handler, our paravirt
clts() would see the flag is already clear and do no hypercall.

 -- Keir

Konrad Rzeszutek Wilk

2012-Jun-18 18:24 UTC

head link

Re: fpu_taskswitch adjustment proposal

On Mon, Jun 18, 2012 at 08:32:05AM +0100, Jan Beulich
wrote:> >>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> > On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com>
wrote:
> > 
> >> While pv-ops so far doesn''t care to eliminate the two
trap-and-
> >> emulate CR0 accesses from the asm/xor.h save/restore
> >> operations, the legacy x86-64 kernel uses conditional
clts()/stts()
> >> for this purpose. While looking into whether to extend this to the
> >> newly added (in 3.5) AVX operations there I realized that this
isn''t
> >> fully correct: It doesn''t properly nest inside a
kernel_fpu_begin()/
> >> kernel_fpu_end() pair (as it will stts() at the end no matter what
> >> the original state of CR0.TS was).
That sounds like a bug in the generic code then?> >> 
> >> In order to not introduce completely new hypercalls to overcome
> >> this (fpu_taskswitch isn''t really extensible on its own),
I''m
> >> considering to add a new VM assist, altering the fpu_taskswitch
> >> behavior so that it would return an indicator whether any change
> >> to the virtual CR0.TS was done. That way, the kernel can
> >> implement a true save/restore cycle here.
How would that work with the multi-calls? Right now clts is batched
and so is cr0 write.> > 
> > It should be possible for the guest kernel to track its CR0.TS setting
> > shouldn''t it? It gets modified only via a few paravirt hooks,
and implicitly
Hm, the clts() paravirt could take advantage of the per-cpu cr0 to
figure out whether it truly needs to do anything.
> > cleared on #NM.
> Sure, but selling this to the Linux maintainers I would expect to be
> harder than fitting the Xen side of things into the current save-
> and-restore model the native xor code uses. It would only be strait
> forward to implement on the legacy, forward ported trees.
> 
> However, with the #NM handler in pv-ops apparently not
> leveraging the fact that CR0.TS is already cleared for it on entry,
> maybe this could indeed be introduced together. Konrad?
Would this require an extra pvops call from the #NM handler?
> 
> Jan

Keir Fraser

2012-Jun-18 22:35 UTC

head link

Re: fpu_taskswitch adjustment proposal

On 18/06/2012 19:24, "Konrad Rzeszutek Wilk"
<konrad.wilk@oracle.com> wrote:
>>> It should be possible for the guest kernel to track its CR0.TS
setting
>>> shouldn''t it? It gets modified only via a few paravirt
hooks, and implicitly
> 
> Hm, the clts() paravirt could take advantage of the per-cpu cr0 to
> figure out whether it truly needs to do anything.
Exactly.
>>> cleared on #NM.
>> Sure, but selling this to the Linux maintainers I would expect to be
>> harder than fitting the Xen side of things into the current save-
>> and-restore model the native xor code uses. It would only be strait
>> forward to implement on the legacy, forward ported trees.
>> 
>> However, with the #NM handler in pv-ops apparently not
>> leveraging the fact that CR0.TS is already cleared for it on entry,
>> maybe this could indeed be introduced together. Konrad?
> 
> Would this require an extra pvops call from the #NM handler?
Or we wrap the #NM handler (somehow?) and clear the per-cpu cr0.ts software
flag before calling into the generic #NM handler.

 -- Keir

Jan Beulich

2012-Jun-26 16:25 UTC

head link

Re: fpu_taskswitch adjustment proposal

>>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>> While pv-ops so far doesn''t care to eliminate the two
trap-and-
>> emulate CR0 accesses from the asm/xor.h save/restore
>> operations, the legacy x86-64 kernel uses conditional clts()/stts()
>> for this purpose. While looking into whether to extend this to the
>> newly added (in 3.5) AVX operations there I realized that this
isn''t
>> fully correct: It doesn''t properly nest inside a
kernel_fpu_begin()/
>> kernel_fpu_end() pair (as it will stts() at the end no matter what
>> the original state of CR0.TS was).
>> 
>> In order to not introduce completely new hypercalls to overcome
>> this (fpu_taskswitch isn''t really extensible on its own),
I''m
>> considering to add a new VM assist, altering the fpu_taskswitch
>> behavior so that it would return an indicator whether any change
>> to the virtual CR0.TS was done. That way, the kernel can
>> implement a true save/restore cycle here.
> 
> It should be possible for the guest kernel to track its CR0.TS setting
> shouldn''t it? It gets modified only via a few paravirt hooks, and
implicitly
> cleared on #NM.
While this works fine and is fairly non-intrusive, it''s not really
buying us much: The non-SSE variants of the xor code will still
outperform the SSE one on both 32- and 64-bit x86 (and the
MMX ones on 32-bit). So I now instead wonder why linux-2.6.18''s
include/asm-x86_64/mach-xen/asm/xor.h doesn''t simply
forward to asm-generic/xor.h, or at least doesn''t override the
template selection logic. Jun, do you recall whether this was
perhaps done without any actual measurements when the port
to x86-64 was first done?

Jan

Xen devel - Jun 2012 - fpu_taskswitch adjustment proposal

fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal

Re: fpu_taskswitch adjustment proposal