thr3ads.net - Xen devel - [Xen-devel] Live migration fails due to c/s 20627 [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2009-Dec-14 18:02 UTC

[Xen-devel] Live migration fails due to c/s 20627

Keir, please consider backing out c/s 20627.  I don''t believe
all the cases have been properly thought through, and
the consequences have impact on applications and thus
on existing customers.  As far as I can tell, there is
no urgency to get this into Xen 4.0 since existing apps
and guest OS''s that use rdtscp must check cpuid to see
if the instruction is present on the hardware.  But putting
a partial solution into 4.0 may cause Xen versioning
issues that affect apps for years to come.   This
is an ABI, not a feature!
> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> Sent: Friday, December 11, 2009 5:24 PM
> Subject: RE: [Xen-devel][PATCH 02/02] VMX: Add HVM RDTSCP support
> 
> Whether a system has rdtscp support is indicated by
> the cpuid. Management tool or system admin should
> use CPUIDdetermine whether the migration is allowed. 
> I think besides RDTSCP, we already have such cases.
This may be true in concept, but existing tools (including
the default xm tools) do NOT check for this... I just
tested a live migration between a Nehalem (which supports
rdtscp) and a Conroe (which does not).  The live migration
works fine and the app using rdtscp runs fine on the
Nehalem and then crashes when the live migration completes
on the Conroe.  I *know* of existing code in Oracle
that will be broken by this!

List of "open" discussions:
- virtualization of rdtscp on processors that don''t
  support it (PV does, HVM doesn''t)
- virtualizing (or not) TSC-AUX
- the Xen 32-bit vs Xen 64-bit inconsistency
- how to communicate pcpu vs vcpu and pnode vs vnode
  (and whether this has any relevance for TSC-AUX)
- hvm support of the pvrdtscp algorithm
- toolset capability and compatibility

On any of these points, I''m not saying I am right and
anyone else is wrong, I''m just saying further discussion
is warranted and getting it wrong in 4.0 has significant
risk and consequences if we proceed haphazardly.

I am only urging caution and proper design.

Thanks,
Dan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Dec-14 23:19 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 14/12/2009 18:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> This may be true in concept, but existing tools (including
> the default xm tools) do NOT check for this... I just
> tested a live migration between a Nehalem (which supports
> rdtscp) and a Conroe (which does not).  The live migration
> works fine and the app using rdtscp runs fine on the
> Nehalem and then crashes when the live migration completes
> on the Conroe.  I *know* of existing code in Oracle
> that will be broken by this!
This is a general problem for migration between dissimilar processors. The
solution is to ''level'' the feature sets, by masking CPUID
flags from the
more-featured processor. In this case you would mask out RDTSCP (and perhaps
others too). This does need the RDTSCP flag setting/clearing to be moved to
xc_cpuid_x86.c, as currently the user cannot override the policy wedged into
the hypervisor itself. That''s an easy thing to fix.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xu, Dongxiao

2009-Dec-15 04:40 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Keir Fraser wrote:> On 14/12/2009 18:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
> wrote: 
> 
>> This may be true in concept, but existing tools (including
>> the default xm tools) do NOT check for this... I just
>> tested a live migration between a Nehalem (which supports
>> rdtscp) and a Conroe (which does not).  The live migration
>> works fine and the app using rdtscp runs fine on the
>> Nehalem and then crashes when the live migration completes
>> on the Conroe.  I *know* of existing code in Oracle
>> that will be broken by this!
> 
> This is a general problem for migration between dissimilar
> processors. The solution is to ''level'' the feature sets,
by masking
> CPUID flags from the more-featured processor. In this case you would
> mask out RDTSCP (and perhaps others too). This does need the RDTSCP
> flag setting/clearing to be moved to xc_cpuid_x86.c, as currently the
> user cannot override the policy wedged into the hypervisor itself.
> That''s an easy thing to fix. 
I will write a patch for this. Thanks!

-- Dongxiao
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-15 15:56 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Hi Dongxiao --

Why would you disable live migraton between two
very widely used Intel processors for ALL HVM domains
just because some domains use the rdtscp instruction?

Why not just add the code to do rdtscp emulation,
which would NOT break live migration?

There are many cases where rdtsc/rdtscp instructions
are emulated and so most of the code is already there.
You only need to intercept illegal instruction traps,
so there is not a significant performance issue.
And the code to do the emulation is necessary
to implement the pvrdtscp algorithm on hvm anyway
(which I think was the reason this whole discussion
started).

Dan
> -----Original Message-----
> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> Sent: Monday, December 14, 2009 9:40 PM
> To: Keir Fraser; Dan Magenheimer
> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Kurt Hackel;
> Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
> Subject: RE: Live migration fails due to c/s 20627
> 
> 
> Keir Fraser wrote:
> > On 14/12/2009 18:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
> > wrote: 
> > 
> >> This may be true in concept, but existing tools (including
> >> the default xm tools) do NOT check for this... I just
> >> tested a live migration between a Nehalem (which supports
> >> rdtscp) and a Conroe (which does not).  The live migration
> >> works fine and the app using rdtscp runs fine on the
> >> Nehalem and then crashes when the live migration completes
> >> on the Conroe.  I *know* of existing code in Oracle
> >> that will be broken by this!
> > 
> > This is a general problem for migration between dissimilar
> > processors. The solution is to ''level'' the feature
sets, by masking
> > CPUID flags from the more-featured processor. In this case you would
> > mask out RDTSCP (and perhaps others too). This does need the RDTSCP
> > flag setting/clearing to be moved to xc_cpuid_x86.c, as 
> currently the
> > user cannot override the policy wedged into the hypervisor itself.
> > That''s an easy thing to fix. 
> 
> I will write a patch for this. Thanks!
> 
> -- Dongxiao
> 
> > 
> >  -- Keir
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xu, Dongxiao

2009-Dec-15 16:10 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Dan Magenheimer wrote:> Hi Dongxiao --
> 
> Why would you disable live migraton between two
> very widely used Intel processors for ALL HVM domains
> just because some domains use the rdtscp instruction?
Dan, I won''t disable the migration. As Keir said, I will put
the cpuid logic in xc_cpuid_x86.c so that admin can use
configuration file to mask rdtscp feature through cpuid. 
This is the common usage model for live migration 
between two different hosts.
> 
> Why not just add the code to do rdtscp emulation,
> which would NOT break live migration?
Add rdtscp emulation has such problem that, in Intel VMX, the 
vmexit control for rdtsc and rdtscp is the same, so if we trap
rdtscp for emulation, OS will suffer from looooots of rdtsc vmexit,
which will bring performance downgrade.
> 
> There are many cases where rdtsc/rdtscp instructions
> are emulated and so most of the code is already there.
> You only need to intercept illegal instruction traps,
> so there is not a significant performance issue.
> And the code to do the emulation is necessary
> to implement the pvrdtscp algorithm on hvm anyway
I think in HVM environment, we should respect the native
behavior. Moreover, it would be valuable for
guest if it could get the node/cpu info which reflects
hardware topology.

Thanks!
Dongxiao
> (which I think was the reason this whole discussion
> started).
> 
> Dan
> 
>> -----Original Message-----
>> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
>> Sent: Monday, December 14, 2009 9:40 PM
>> To: Keir Fraser; Dan Magenheimer
>> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Kurt Hackel;
>> Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
>> Subject: RE: Live migration fails due to c/s 20627
>> 
>> 
>> Keir Fraser wrote:
>>> On 14/12/2009 18:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
>>> wrote: 
>>> 
>>>> This may be true in concept, but existing tools (including
>>>> the default xm tools) do NOT check for this... I just
>>>> tested a live migration between a Nehalem (which supports
>>>> rdtscp) and a Conroe (which does not).  The live migration
>>>> works fine and the app using rdtscp runs fine on the
>>>> Nehalem and then crashes when the live migration completes
>>>> on the Conroe.  I *know* of existing code in Oracle
>>>> that will be broken by this!
>>> 
>>> This is a general problem for migration between dissimilar
>>> processors. The solution is to ''level'' the
feature sets, by masking
>>> CPUID flags from the more-featured processor. In this case you
would
>>> mask out RDTSCP (and perhaps others too). This does need the RDTSCP
>>> flag setting/clearing to be moved to xc_cpuid_x86.c, as currently
>>> the user cannot override the policy wedged into the hypervisor
>>> itself. That''s an easy thing to fix.
>> 
>> I will write a patch for this. Thanks!
>> 
>> -- Dongxiao
>> 
>>> 
>>>  -- Keir
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-15 16:31 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

> Add rdtscp emulation has such problem that, in Intel VMX, the 
> vmexit control for rdtsc and rdtscp is the same, so if we trap
> rdtscp for emulation, OS will suffer from looooots of rdtsc vmexit,
> which will bring performance downgrade.
OK, I see... you probably are not aware of all the recent
work in Xen in this area.  Machines that do not have rdtscp
support are highly likely to be emulating rdtsc anyway.
This is true for both PV and HVM domains.

Please check docs/misc/tscmode.txt in a xen tree from
the last few days.

But in any case, I am not suggesting any change in
RDTSC_EXITING... that code is fine.  I am suggesting
adding code similar to the emulate_invalid_rdtscp
for PV in c/s 20504 to catch the illegal instruction
traps on machines where the instruction is illegal.

Dan
> -----Original Message-----
> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> Sent: Tuesday, December 15, 2009 9:10 AM
> To: Dan Magenheimer; Keir Fraser
> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Kurt Hackel;
> Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
> Subject: RE: Live migration fails due to c/s 20627
> 
> 
> Dan Magenheimer wrote:
> > Hi Dongxiao --
> > 
> > Why would you disable live migraton between two
> > very widely used Intel processors for ALL HVM domains
> > just because some domains use the rdtscp instruction?
> 
> Dan, I won''t disable the migration. As Keir said, I will put
> the cpuid logic in xc_cpuid_x86.c so that admin can use
> configuration file to mask rdtscp feature through cpuid. 
> This is the common usage model for live migration 
> between two different hosts.
> 
> > 
> > Why not just add the code to do rdtscp emulation,
> > which would NOT break live migration?
> 
> Add rdtscp emulation has such problem that, in Intel VMX, the 
> vmexit control for rdtsc and rdtscp is the same, so if we trap
> rdtscp for emulation, OS will suffer from looooots of rdtsc vmexit,
> which will bring performance downgrade.
> 
> > 
> > There are many cases where rdtsc/rdtscp instructions
> > are emulated and so most of the code is already there.
> > You only need to intercept illegal instruction traps,
> > so there is not a significant performance issue.
> > And the code to do the emulation is necessary
> > to implement the pvrdtscp algorithm on hvm anyway
> 
> I think in HVM environment, we should respect the native
> behavior. Moreover, it would be valuable for
> guest if it could get the node/cpu info which reflects
> hardware topology.
> 
> Thanks!
> Dongxiao
> 
> > (which I think was the reason this whole discussion
> > started).
> > 
> > Dan
> > 
> >> -----Original Message-----
> >> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> >> Sent: Monday, December 14, 2009 9:40 PM
> >> To: Keir Fraser; Dan Magenheimer
> >> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; 
> Kurt Hackel;
> >> Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
> >> Subject: RE: Live migration fails due to c/s 20627
> >> 
> >> 
> >> Keir Fraser wrote:
> >>> On 14/12/2009 18:02, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com>
> >>> wrote: 
> >>> 
> >>>> This may be true in concept, but existing tools (including
> >>>> the default xm tools) do NOT check for this... I just
> >>>> tested a live migration between a Nehalem (which supports
> >>>> rdtscp) and a Conroe (which does not).  The live migration
> >>>> works fine and the app using rdtscp runs fine on the
> >>>> Nehalem and then crashes when the live migration completes
> >>>> on the Conroe.  I *know* of existing code in Oracle
> >>>> that will be broken by this!
> >>> 
> >>> This is a general problem for migration between dissimilar
> >>> processors. The solution is to ''level'' the
feature sets,
> by masking
> >>> CPUID flags from the more-featured processor. In this 
> case you would
> >>> mask out RDTSCP (and perhaps others too). This does need 
> the RDTSCP
> >>> flag setting/clearing to be moved to xc_cpuid_x86.c, as
currently
> >>> the user cannot override the policy wedged into the hypervisor
> >>> itself. That''s an easy thing to fix.
> >> 
> >> I will write a patch for this. Thanks!
> >> 
> >> -- Dongxiao
> >> 
> >>> 
> >>>  -- Keir
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-15 17:08 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Oops, forgot to reply to this part of your message.
> > There are many cases where rdtsc/rdtscp instructions
> > are emulated and so most of the code is already there.
> > You only need to intercept illegal instruction traps,
> > so there is not a significant performance issue.
> > And the code to do the emulation is necessary
> > to implement the pvrdtscp algorithm on hvm anyway
> 
> I think in HVM environment, we should respect the native
> behavior.
I''m not sure why you think that.  There are many many
native silicon features that are not exposed directly
to the guest OS.  The whole point of virtualization
is to present an abstract hardware interface so that
multiple VMs can be supported on a single machine and
VMs can be moved between underlying hardware implementations.
Breaking that flexibility for a rarely utilized instruction
such as rdtscp seems like a very bad idea.
> Moreover, it would be valuable for
> guest if it could get the node/cpu info which reflects
> hardware topology.
As Jeremy has pointed out, the guest OS already has
other mechanisms to provide this information, and
as Jun has pointed out, the non-rdtscp mechanism (lsl
on Linux) may even be faster. Windows does not even
provide TSC_AUX, so it definitely has other ways to
obtain node/cpu info.  And, as I''ve said before,
the node/cpu info provided by Linux in TSC_AUX is
wrong anyway (except in very constrained environments
such as where the admin has pinned vcpus to pcpus).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Dec-15 17:12 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 12/15/09 08:10, Xu, Dongxiao wrote:>> Why not just add the code to do rdtscp emulation,
>> which would NOT break live migration?
>>      
> Add rdtscp emulation has such problem that, in Intel VMX, the
> vmexit control for rdtsc and rdtscp is the same, so if we trap
> rdtscp for emulation, OS will suffer from looooots of rdtsc vmexit,
> which will bring performance downgrade.
>    
I don''t see why that''s relevant.  In the case where
you''ve migrated the
domain, if the CPU has rdtsc but not rdtscp, won''t the rdtscp vmexit 
with an illegal instruction trap?  In that case you can emulate rdtscp 
while still having direct execution of rdtsc.

Of course, having a wide difference between rdtscp and rdtsc performance 
may cause its own set of problems.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xu, Dongxiao

2009-Dec-15 17:24 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Jeremy Fitzhardinge wrote:> On 12/15/09 08:10, Xu, Dongxiao wrote:
>>> Why not just add the code to do rdtscp emulation,
>>> which would NOT break live migration?
>>> 
>> Add rdtscp emulation has such problem that, in Intel VMX, the
>> vmexit control for rdtsc and rdtscp is the same, so if we trap
>> rdtscp for emulation, OS will suffer from looooots of rdtsc vmexit,
>> which will bring performance downgrade.
>> 
> 
> I don''t see why that''s relevant.  In the case where
you''ve migrated
> the domain, if the CPU has rdtsc but not rdtscp, won''t the rdtscp
> vmexit with an illegal instruction trap?  In that case you can
> emulate rdtscp while still having direct execution of rdtsc.
If CPU has rdtsc but no rdtscp, then the VM exec control bit in VMCS
won''t be turned on. Therefore if rdtscp instruction runs, it will
encounter
invalid op code directly but no VMEXIT.> 
> Of course, having a wide difference between rdtscp and rdtsc
> performance may cause its own set of problems.
> 
>      J


Best Regards,
-- Dongxiao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xu, Dongxiao

2009-Dec-15 18:04 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Dan Magenheimer wrote:> Oops, forgot to reply to this part of your message.
> 
>>> There are many cases where rdtsc/rdtscp instructions
>>> are emulated and so most of the code is already there.
>>> You only need to intercept illegal instruction traps,
>>> so there is not a significant performance issue.
>>> And the code to do the emulation is necessary
>>> to implement the pvrdtscp algorithm on hvm anyway
>> 
>> I think in HVM environment, we should respect the native
>> behavior.
> 
> I''m not sure why you think that.  There are many many
> native silicon features that are not exposed directly
> to the guest OS.  The whole point of virtualization
> is to present an abstract hardware interface so that
> multiple VMs can be supported on a single machine and
> VMs can be moved between underlying hardware implementations.
> Breaking that flexibility for a rarely utilized instruction
> such as rdtscp seems like a very bad idea.
I didn''t break the flexibility, as stated before, my next patch
will move the code for guest cpuid presentation, and then 
Administrator could mask the bit for migration. It is not only
Rdtscp has this problem.
> 
>> Moreover, it would be valuable for
>> guest if it could get the node/cpu info which reflects
>> hardware topology.
> 
> As Jeremy has pointed out, the guest OS already has
> other mechanisms to provide this information, and
> as Jun has pointed out, the non-rdtscp mechanism (lsl
> on Linux) may even be faster. Windows does not even
> provide TSC_AUX, so it definitely has other ways to
> obtain node/cpu info.  And, as I''ve said before,
> the node/cpu info provided by Linux in TSC_AUX is
> wrong anyway (except in very constrained environments
> such as where the admin has pinned vcpus to pcpus).
After guest numa is implemented, it is not a constrained
environment. Guest will benefit from the information such as
Implementing fast vgetcpu() and so on...

Best Regards,
-- Dongxiao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xu, Dongxiao

2009-Dec-15 18:05 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

HVM RDTSCP fix.
 - Put the guest rdtscp cpuid logic in xc_cpuid_x86.c.
 - MSR_TSC_AUX''s high 32bit is reserved, so only write the low 32bit.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>



Xu, Dongxiao wrote:> Keir Fraser wrote:
>> On 14/12/2009 18:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
>> wrote: 
>> 
>>> This may be true in concept, but existing tools (including
>>> the default xm tools) do NOT check for this... I just
>>> tested a live migration between a Nehalem (which supports
>>> rdtscp) and a Conroe (which does not).  The live migration
>>> works fine and the app using rdtscp runs fine on the
>>> Nehalem and then crashes when the live migration completes
>>> on the Conroe.  I *know* of existing code in Oracle
>>> that will be broken by this!
>> 
>> This is a general problem for migration between dissimilar
>> processors. The solution is to ''level'' the feature
sets, by masking
>> CPUID flags from the more-featured processor. In this case you would
>> mask out RDTSCP (and perhaps others too). This does need the RDTSCP
>> flag setting/clearing to be moved to xc_cpuid_x86.c, as currently the
>> user cannot override the policy wedged into the hypervisor itself.
>> That''s an easy thing to fix.
> 
> I will write a patch for this. Thanks!
> 
> -- Dongxiao
> 
>> 
>>  -- Keir
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Dec-15 18:25 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 12/15/2009 09:24 AM, Xu, Dongxiao wrote:> If CPU has rdtsc but no rdtscp, then the VM exec control bit in VMCS
> won''t be turned on. Therefore if rdtscp instruction runs, it will
encounter
> invalid op code directly but no VMEXIT.
>    
Ah, right.  You''d need to make that particular illegal instruction
vmexit.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-15 19:20 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Tuesday, December 15, 2009 11:26 AM
> To: Xu, Dongxiao
> Cc: Dan Magenheimer; Keir Fraser; xen-devel@lists.xensource.com; Kurt
> Hackel; Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
> Subject: Re: Live migration fails due to c/s 20627
> 
> 
> On 12/15/2009 09:24 AM, Xu, Dongxiao wrote:
> > If CPU has rdtsc but no rdtscp, then the VM exec control bit in VMCS
> > won''t be turned on. Therefore if rdtscp instruction runs, 
> it will encounter
> > invalid op code directly but no VMEXIT.
> >    
> 
> Ah, right.  You''d need to make that particular illegal 
> instruction vmexit.
Or make ALL illegal instructions vmexit, decode, if rdtscp
emulate it, else vmenter again.

Is this not done anyplace else in the hvm code?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Dec-15 19:25 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 15/12/2009 18:25, "Jeremy Fitzhardinge" <jeremy@goop.org>
wrote:
> On 12/15/2009 09:24 AM, Xu, Dongxiao wrote:
>> If CPU has rdtsc but no rdtscp, then the VM exec control bit in VMCS
>> won''t be turned on. Therefore if rdtscp instruction runs, it
will encounter
>> invalid op code directly but no VMEXIT.
>>    
> 
> Ah, right.  You''d need to make that particular illegal instruction
vmexit.
We''d need to VMEXIT on any guest #UD and then call into our x86
emulator.
There''s just a slight feeling that could have wider impact and
implications
than the specific case we''d want to handle here. Then again, #UD is
rare and
usually unexpected, so perhaps such a seemingly broad change is not so
dangerous.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Dec-15 19:31 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 15/12/2009 19:20, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> Ah, right.  You''d need to make that particular illegal
>> instruction vmexit.
> 
> Or make ALL illegal instructions vmexit, decode, if rdtscp
> emulate it, else vmenter again.
> 
> Is this not done anyplace else in the hvm code?
Oh, in fact I am wrong in my previous email, replying to Jeremy, claiming we
do not trap-and-emulate on illegal instructions. In fact we *do*, as it got
added to handle SYSCALL vs SYSENTER when migrating between Intel and AMD
hosts.

So all that would need to be done is to add RDTSCP support to x86_emulate.c,
as it''s currently missing. But that''s pretty trivial.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Dec-15 19:52 UTC

head link

Re: [Xen-devel] Re: Live migration fails due to c/s 20627

On 15/12/2009 19:31, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
> So all that would need to be done is to add RDTSCP support to
x86_emulate.c,
> as it''s currently missing. But that''s pretty trivial.
...I''ll sort out a patch for this tomorrow. No reason this
shouldn''t be
added to the emulator.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nakajima, Jun

2009-Dec-16 02:07 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Dan Magenheimer wrote on Tue, 15 Dec 2009 at 09:08:40:
> 
> As Jeremy has pointed out, the guest OS already has
> other mechanisms to provide this information, and
> as Jun has pointed out, the non-rdtscp mechanism (lsl
> on Linux) may even be faster. Windows does not even
> provide TSC_AUX, so it definitely has other ways to
> obtain node/cpu info.  And, as I''ve said before,
> the node/cpu info provided by Linux in TSC_AUX is
> wrong anyway (except in very constrained environments
> such as where the admin has pinned vcpus to pcpus).
I think we should distinguish architectural support and Linux-specific issues.
We need to enable RDSTCP support in HVM if user apps want to use it. If we want
the Linux or other kernel to stop using TSC_AUX, we can mask off the feature in
CPUID. Access to the MSR should result in #GP in the guest if the feature is not
advertized.

BTW, I''m hearing that the max TSC difference among CPUs is less than 10
cycles or so (virtually 0) at least Intel-based machines (except limited ones
with known issues), so probably such pinning should not be required because vcpu
migration should take more than.

Jun
___
Intel Open Source Technology Center




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2009-Dec-16 02:43 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

>.  And, as I''ve said before,
> the node/cpu info provided by Linux in TSC_AUX is
> wrong anyway (except in very constrained environments
> such as where the admin has pinned vcpus to pcpus).
I don''t agree with you at this point. For guest numa support, it should
be a must to pin virtual node''s vcpus to its related physical node and
crossing-node vcpu migration should be disallowed by default, otherwise guest
numa support is meaningless, right ?
If vcpu''s migration only happens in its physical node, I can''t
see why you thought the info provided in the MSR is wrong.   Actually, each vcpu
should have a virtual TSC_AUX_MSR(guest should set it before using it), and this
virtual MSR is saved/restored from/to physical TSC_AUX_MSR between context
switch, so in vmx non-root mode the value in physical TSC_AUX_MSR should follow
guest''s setting rather than host''s setting , and it also
reflect guest''s info related to virtual node/virtual cpu, and it still
should be the expected value for guest''s applications.  In addition, we
have to know host''s TSC_AUX_MSR and guest''s TSC_AUX_MSR are
totally two different things except that they are saved in one physical register
in cpu''s different execution phases, shouldn''t  mix them
together.
Xiantao


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-16 04:14 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

> >.  And, as I''ve said before,
> > the node/cpu info provided by Linux in TSC_AUX is
> > wrong anyway (except in very constrained environments
> > such as where the admin has pinned vcpus to pcpus).
> 
> I don''t agree with you at this point. For guest numa support, 
> it should be a must to pin virtual node''s vcpus to its 
> related physical node and crossing-node vcpu migration should 
> be disallowed by default, otherwise guest numa support is 
> meaningless, right ?
It''s not a must.  A system administrator should always
have the option of choosing flexibility vs performance.
I agree that when performance is higher priority, pinning
is a must, but pinning may even have issues when the
guest''s nvcpus exceeds the number of cores in a node.

So I am saying there are many cases where TSC_AUX,
when set by a guest OS, will be incorrect.  And yes I
agree there are cases (with pinning) where it will
be correct.  But how does an app or OS know whether to
trust TSC_AUX or not?  Better to have some other
method to get pcpu/pnode information that is known
to be always correct (via some method of querying the hypervisor
directly).
> If vcpu''s migration only happens in its physical node, I 
> can''t see why you thought the info provided in the MSR is 
> wrong.   Actually, each vcpu should have a virtual 
> TSC_AUX_MSR(guest should set it before using it), and this 
> virtual MSR is saved/restored from/to physical TSC_AUX_MSR 
> between context switch, so in vmx non-root mode the value in 
> physical TSC_AUX_MSR should follow guest''s setting rather 
> than host''s setting , and it also reflect guest''s info 
> related to virtual node/virtual cpu, and it still should be 
> the expected value for guest''s applications.  In addition, we 
> have to know host''s TSC_AUX_MSR and guest''s TSC_AUX_MSR
are
> totally two different things except that they are saved in 
> one physical register in cpu''s different execution phases, 
> shouldn''t  mix them together.      
My argument is simply that if TSC_AUX cannot ALWAYS
be trusted by an application, apps will NEVER trust it.
And if apps NEVER trust it, why expose it at all?

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2009-Dec-16 05:07 UTC

head link

[Xen-devel] RE: Live migration fails due to c/s 20627

Dan Magenheimer wrote:>>> .  And, as I''ve said before,
>>> the node/cpu info provided by Linux in TSC_AUX is
>>> wrong anyway (except in very constrained environments
>>> such as where the admin has pinned vcpus to pcpus).
>> 
>> I don''t agree with you at this point. For guest numa support,
>> it should be a must to pin virtual node''s vcpus to its
>> related physical node and crossing-node vcpu migration should
>> be disallowed by default, otherwise guest numa support is
>> meaningless, right ?
> 
> It''s not a must.  A system administrator should always
> have the option of choosing flexibility vs performance.
> I agree that when performance is higher priority, pinning
> is a must, but pinning may even have issues when the
> guest''s nvcpus exceeds the number of cores in a node. 
Could you elaborate the issues you can see ?  Normally, virtual node''s
number of vcpus should be less than one physical node''s cpu number. But
enen if vcpu''s number is more than physical cpu''s number in a
node, why it can introduce issues ?
> So I am saying there are many cases where TSC_AUX,
> when set by a guest OS, will be incorrect.  
Could you figure out the incorrect cases ?  
>And yes I
> agree there are cases (with pinning) where it will
> be correct.  But how does an app or OS know whether to
> trust TSC_AUX or not?  
If hypervisor exposes this instruction to guest, it should be trusted and safe
to use, because hypervisor should be responsible for fully virtualizing this
instruction and let guest think it is running one a native machine.
>Better to have some other
> method to get pcpu/pnode information that is known
> to be always correct (via some method of querying the hypervisor
> directly
I don''t think guest should know host''s numa info through
anyway. Basically, guest only needs to be aware guest''s numa info. For
example, host numa info maybe 2 nodes and each node is configured with 16G mem
and 16 LPs , guest''s virtual numa info maybe 2 nodes and each node has
2G mem and 2 vcpus. In this case, guest only needs to get the virtual numa info
instead of host''s numa info when it enables numa support.  And at the
same time, hypervisor is reponsible for how to allocate 2G memory from 16G mem
from the physical node, and how to schudle virtual node''s vcpus to
physical cpus(according to performance vs flexibility as you said).
>> If vcpu''s migration only happens in its physical node, I
>> can''t see why you thought the info provided in the MSR is
>> wrong.   Actually, each vcpu should have a virtual
>> TSC_AUX_MSR(guest should set it before using it), and this
>> virtual MSR is saved/restored from/to physical TSC_AUX_MSR
>> between context switch, so in vmx non-root mode the value in
>> physical TSC_AUX_MSR should follow guest''s setting rather
>> than host''s setting , and it also reflect guest''s
info
>> related to virtual node/virtual cpu, and it still should be
>> the expected value for guest''s applications.  In addition, we
>> have to know host''s TSC_AUX_MSR and guest''s
TSC_AUX_MSR are
>> totally two different things except that they are saved in
>> one physical register in cpu''s different execution phases,
>> shouldn''t  mix them together.
> 
> My argument is simply that if TSC_AUX cannot ALWAYS
> be trusted by an application, apps will NEVER trust it.
> And if apps NEVER trust it, why expose it at all?
This instruction is safe to use and has been fully virtualized in vmx non-root
mode via Dongxiao''s patch, why not trust it ? I can''t figure
out one reason. :-)
Xiantao


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Dec-16 06:19 UTC

head link

[Xen-devel] Re: Live migration fails due to c/s 20627

On 12/15/2009 08:14 PM, Dan Magenheimer wrote:> My argument is simply that if TSC_AUX cannot ALWAYS
> be trusted by an application, apps will NEVER trust it.
> And if apps NEVER trust it, why expose it at all?
>    
The cpu/node info is only of heuristic value anyway; it is never 
trustworthy in an absolute sense.  Apps just use it to try to optimise 
their own memory allocation and use patterns, but they can''t rely on 
that info for actual correctness.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-16 15:20 UTC

head link

RE: [Xen-devel] Re: Live migration fails due to c/s 20627

> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> 
> On 12/15/2009 08:14 PM, Dan Magenheimer wrote:
> > My argument is simply that if TSC_AUX cannot ALWAYS
> > be trusted by an application, apps will NEVER trust it.
> > And if apps NEVER trust it, why expose it at all?
> 
> The cpu/node info is only of heuristic value anyway; it is never 
> trustworthy in an absolute sense.  Apps just use it to try to 
> optimise 
> their own memory allocation and use patterns, but they can''t rely
on
> that info for actual correctness.
Well, "heuristic" implies a reasonably high probability of
getting the right answer.  Would you agree that the probability
that TSC_AUX gets the "right" answer is much higher
in a physical environment than in a (non-pinned) virtual
environment?  And if the heuristic is wrong more often
than right, that using that heuristic is a bad idea?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Dec-16 15:41 UTC

head link

Re: [Xen-devel] RE: Live migration fails due to c/s 20627

Dan Magenheimer wrote:>> -----Original Message-----
>> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
>> Sent: Tuesday, December 15, 2009 11:26 AM
>> To: Xu, Dongxiao
>> Cc: Dan Magenheimer; Keir Fraser; xen-devel@lists.xensource.com; Kurt
>> Hackel; Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
>> Subject: Re: Live migration fails due to c/s 20627
>>
>>
>> On 12/15/2009 09:24 AM, Xu, Dongxiao wrote:
>>> If CPU has rdtsc but no rdtscp, then the VM exec control bit in
VMCS
>>> won''t be turned on. Therefore if rdtscp instruction runs, 
>> it will encounter
>>> invalid op code directly but no VMEXIT.
>>>    
>> Ah, right.  You''d need to make that particular illegal 
>> instruction vmexit.
> 
> Or make ALL illegal instructions vmexit, decode, if rdtscp
> emulate it, else vmenter again.But is it useful to emulate RDTSCP? I see two use cases for this 
instruction:
1) NUMA aware malloc:
You need to know the current node number _quickly_ to use the right 
bucket to take the memory from. You do not even want using a syscall for 
this, that''s why getcpu in Linux is implemented as a vsyscall either 
using RDTSCP or LSL. If you emulate this, this will need a few thousand 
cycles.
2) Making sure TSC values are consistent:
By looking at the core ID you learn whether two consecutive RDTSCPs are 
from the same core and are thus reliable. If you loose a few thousand 
cycles with emulation, than the whole purpose of doing the RDTSCPs is in 
question, as your results would be spoiled due to the overhead.

These two issues are the main reason I refrained from implementing 
RDTSCP virtualization some months ago, as even virtualizing them 
introduces a slight overhead (MSR save/restore). As software seems to 
cope with not having this instruction (and using the perfectly 
virtualized lsl instruction, for instance), I thought the benefit would 
not justify the effort.

Dan, can you summarize the usage of RDTSCP emulation in PV? Honestly I 
got lost in all these threads..


Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-16 16:23 UTC

head link

RE: [Xen-devel] RE: Live migration fails due to c/s 20627

Since this discussion seems to be going in circles, I suspect
we may have some fundamentally different assumptions.  You
likely have some unstated ideas, maybe about the underlying
implementation of the Linux NUMA syscalls when running on
Xen, or maybe defaults for how NUMA-ness might be specified
when creating an HVM domain.

But all of these are mostly unrelated to rdtscp.  The only
reason that this discussion has involved NUMA concepts is
that the rdtscp instruction, by accident rather than by
design, may on some (but not all) guest OS''s, communicate the
guest OS''s concept of cpu and node information to an application.
As Jeremy has pointed out, this cpu/node information is exactly
the same information that can be obtained by a system call.
So the only reason that rdtscp is better than using the
system call would be for performance.

Rdtscp is faster than a system call in many situations, but
now is often emulated in Xen (even on processors that do support
the hardware instruction*), so cannot be assumed to be much
faster than a system call.  And the difference in performance
is only measurable if an app is executing rdtscp many thousands
of times every second.

Are there apps that execute rdtscp many thousands of times
every second PRIMARILY TO OBTAIN the cpu/node information?
If so, I agree that it is unfortunately necessary to expose
the rdtscp instruction.  If not, I would highly recommend
we do NOT expose it now.  Otherwise, to use Keir''s words,
we are "Supporting CPU instructions just because they''re there
[which] is not a useful effort."

Once rdtscp/TSC_AUX is exposed to guests, it is very hard
to remove it again (as saved guests may have tested
the cpuid bit once at startup and will fail if restored).

Other brief NUMA-related replies below.

* See xen-unstable.hg/docs/misc/tscmode.txt for explanation
> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com]
> Dan Magenheimer wrote:
> >>> .  And, as I''ve said before,
> >>> the node/cpu info provided by Linux in TSC_AUX is
> >>> wrong anyway (except in very constrained environments
> >>> such as where the admin has pinned vcpus to pcpus).
> >> 
> >> I don''t agree with you at this point. For guest numa
support,
> >> it should be a must to pin virtual node''s vcpus to its
> >> related physical node and crossing-node vcpu migration should
> >> be disallowed by default, otherwise guest numa support is
> >> meaningless, right ?
> > 
> > It''s not a must.  A system administrator should always
> > have the option of choosing flexibility vs performance.
> > I agree that when performance is higher priority, pinning
> > is a must, but pinning may even have issues when the
> > guest''s nvcpus exceeds the number of cores in a node. 
> 
> Could you elaborate the issues you can see ?  Normally, 
> virtual node''s number of vcpus should be less than one 
> physical node''s cpu number. But enen if vcpu''s number is
more
> than physical cpu''s number in a node, why it can introduce issues
?
Suppose a guest believes it has eight cores on a single
processor/node. It is now started on a machine that has
four cores per processor/node (and two or more sockets).
Since the guest believes it is running on a single node,
it communicates that information (via TSC_AUX or vgetcpu)
to an application. The application is NUMA-aware, but since
the guest OS told it that all cores are on the same node,
it doesn''t use it''s NUMA code/mode.

Suppose a guest believes it has a total of four cores,
two cores on each of two nodes.  It is now started on
some future machine with 16 cores all on a single
node. Since the guest believes it is running on two
nodes, it communicates that information (via TSC_AUX
or vgetcpu) to an application.  The application is
NUMA-aware, and the guest OS told it that there are
two nodes.  This app has very high memory bandwidth
needs, so it spends lots of time doing NUMA-related
syscalls such as Linux move_pages to ensure that the
memory is on the same node as the cpu.  All of these
move calls are wasted.

Both of these situations are very possible in a cloud
environment.

(NOTE: Since this NUMA-related discussion is orthogonal
to rdtscp, we should probably start a separate thread
for further discussion.)

If the above discussion doesn''t clarify my concerns and
I haven''t answered other questions in your email, please
let me know.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Dec-16 16:54 UTC

head link

RE: [Xen-devel] RE: Live migration fails due to c/s 20627

Hi Andre --
> But is it useful to emulate RDTSCP? I see two use cases for this 
Thanks for your support!
> Dan, can you summarize the usage of RDTSCP emulation in PV? 
> Honestly I got lost in all these threads..
Me too :-)

In PV, the "pvcpuid" bit is not set so guest OS''s that use
the proper PV access method to cpuid will believe that
the hardware does not support rdtscp.  Since cpuid is
unprivileged, apps running in PV domains may check the
bit and use rdtscp.  In this case, TSC_AUX should
contain 0.  If this domain is saved/restored/migrated
to a machine that does not support the rdtscp, the
instruction is emulated.

If tsc_mode=3, rdtscp is handled specially.

See xen-unstable.hg/docs/misc/tscmode.txt for more info.

Thanks,
Dan
> -----Original Message-----
> From: Andre Przywara [mailto:andre.przywara@amd.com]
> Sent: Wednesday, December 16, 2009 8:41 AM
> To: Dan Magenheimer
> Cc: Jeremy Fitzhardinge; Xu, Dongxiao; xen-devel@lists.xensource.com;
> Kurt Hackel; Dugger, Donald D; Keir Fraser; Nakajima, Jun; Zhang,
> Xiantao
> Subject: Re: [Xen-devel] RE: Live migration fails due to c/s 20627
> 
> 
> Dan Magenheimer wrote:
> >> -----Original Message-----
> >> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> >> Sent: Tuesday, December 15, 2009 11:26 AM
> >> To: Xu, Dongxiao
> >> Cc: Dan Magenheimer; Keir Fraser; 
> xen-devel@lists.xensource.com; Kurt
> >> Hackel; Dugger, Donald D; Nakajima, Jun; Zhang, Xiantao
> >> Subject: Re: Live migration fails due to c/s 20627
> >>
> >>
> >> On 12/15/2009 09:24 AM, Xu, Dongxiao wrote:
> >>> If CPU has rdtsc but no rdtscp, then the VM exec control 
> bit in VMCS
> >>> won''t be turned on. Therefore if rdtscp instruction
runs,
> >> it will encounter
> >>> invalid op code directly but no VMEXIT.
> >>>    
> >> Ah, right.  You''d need to make that particular illegal 
> >> instruction vmexit.
> > 
> > Or make ALL illegal instructions vmexit, decode, if rdtscp
> > emulate it, else vmenter again.
> But is it useful to emulate RDTSCP? I see two use cases for this 
> instruction:
> 1) NUMA aware malloc:
> You need to know the current node number _quickly_ to use the right 
> bucket to take the memory from. You do not even want using a 
> syscall for 
> this, that''s why getcpu in Linux is implemented as a vsyscall
either
> using RDTSCP or LSL. If you emulate this, this will need a 
> few thousand 
> cycles.
> 2) Making sure TSC values are consistent:
> By looking at the core ID you learn whether two consecutive 
> RDTSCPs are 
> from the same core and are thus reliable. If you loose a few thousand 
> cycles with emulation, than the whole purpose of doing the 
> RDTSCPs is in 
> question, as your results would be spoiled due to the overhead.
> 
> These two issues are the main reason I refrained from implementing 
> RDTSCP virtualization some months ago, as even virtualizing them 
> introduces a slight overhead (MSR save/restore). As software seems to 
> cope with not having this instruction (and using the perfectly 
> virtualized lsl instruction, for instance), I thought the 
> benefit would 
> not justify the effort.
> 
> Dan, can you summarize the usage of RDTSCP emulation in PV? 
> Honestly I 
> got lost in all these threads..
> 
> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> Tel: +49 351 448 3567 12
> ----to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Dec-16 17:31 UTC

head link

Re: [Xen-devel] Re: Live migration fails due to c/s 20627

On 12/16/2009 07:20 AM, Dan Magenheimer wrote:> Well, "heuristic" implies a reasonably high probability of
> getting the right answer.  Would you agree that the probability
> that TSC_AUX gets the "right" answer is much higher
> in a physical environment than in a (non-pinned) virtual
> environment?  And if the heuristic is wrong more often
> than right, that using that heuristic is a bad idea?
>    
It won''t make a difference either way.  Running in a Xen domain, the 
kernel will only see a single NUMA node, so the node id is constant.  
The CPU number may not correspond to a pcpu all the time, but scheduler 
affinity should make a given vcpu number correspond to the same pcpu for 
a while.

In either case, an application paying attention to cpu+node will do at 
least as well as an app ignoring them.  So I don''t think your argument 
that "if TSC_AUX cannot ALWAYS be trusted by an application, apps will 
NEVER trust it" is true at all.

Aside from the fact that the cpu+node issue is completely irrelevant to 
whether we support TSC_AUX.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Dec-16 17:38 UTC

head link

Re: [Xen-devel] RE: Live migration fails due to c/s 20627

On 12/16/2009 08:23 AM, Dan Magenheimer wrote:> As Jeremy has pointed out, this cpu/node information is exactly
> the same information that can be obtained by a system call.
> So the only reason that rdtscp is better than using the
> system call would be for performance.
>    
No, not a system call.  The vgetcpu vsyscall will return the info with 
no syscalls, regardless of whether rdtscp is available.  It encodes the 
data in the segment limit of a special segment, and it can be read back 
with the "lsl" instruction.
> Rdtscp is faster than a system call in many situations, but
> now is often emulated in Xen (even on processors that do support
> the hardware instruction*), so cannot be assumed to be much
> faster than a system call.  And the difference in performance
> is only measurable if an app is executing rdtscp many thousands
> of times every second.
>    
"lsl" is probably at least as fast as rdtscp when executed natively,
and
definitely if rdtscp is emulated.
> Suppose a guest believes it has eight cores on a single
> processor/node.
[...]> Suppose a guest believes it has a total of four cores,
> two cores on each of two nodes.
The pvops kernel never attempts to determine the underlying machine 
topology; it always assumes a single NUMA node.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2009 - Live migration fails due to c/s 20627

[Xen-devel] Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

Re: [Xen-devel] Re: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] RE: Live migration fails due to c/s 20627

[Xen-devel] Re: Live migration fails due to c/s 20627

RE: [Xen-devel] Re: Live migration fails due to c/s 20627

Re: [Xen-devel] RE: Live migration fails due to c/s 20627

RE: [Xen-devel] RE: Live migration fails due to c/s 20627

RE: [Xen-devel] RE: Live migration fails due to c/s 20627

Re: [Xen-devel] Re: Live migration fails due to c/s 20627

Re: [Xen-devel] RE: Live migration fails due to c/s 20627