Zhang, Xiantao
2009-Jun-18 02:56 UTC
[Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Hi, Keir This patchset targets for enabling TSC scaling in software for live migration between platforms with different TSC frequecies. Once found the target host''s frequency is different with source host''s, hypervisor will trap and emulate guest''s all rdtsc instructions with its expected frequency. If hardware''s TSC frequency is difffernt with guest''s exepcted freq, guest may behave abnormally, eg. incorrect wallclock, soft lockup, even hang in some cases. Therefore, this patchset is necessary to avoid such issues. PATCH 0001-- Save guest''s preferred TSC in image for save/restore and migration PATCH 0002-- Move multidiv64 as a library function. PATCH 0003-- Scaling host TSC freqeuncy patch. Signed-off-by Xiantao Zhang <xiantao.zhang@intel.com> Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Jun-18 07:37 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration betweenplatforms with different TSC frequecies
>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 18.06.09 04:56 >>> >PATCH 0003-- Scaling host TSC freqeuncy patch.>+int hvm_gtsc_need_scale(struct domain *d) >+{ >+ uint32_t gtsc_khz; >+ >+ gtsc_khz = d->arch.hvm_domain.gtsc_khz / 1000;Can the variable please be renamed to what it contains (i.e. gtsc_mhz)?> u64 hvm_get_guest_tsc(struct vcpu *v) > { >- u64 host_tsc; >- >- if ( opt_softtsc ) >- host_tsc = hvm_get_guest_time(v); >- else >- rdtscll(host_tsc); >- >- return host_tsc + v->arch.hvm_vcpu.cache_tsc_offset; >+ u64 host_tsc, scaled_htsc; >+ >+ rdtscll(host_tsc); >+ scaled_htsc = hvm_h2g_scale_tsc(v, host_tsc); >+ >+ return scaled_htsc + v->arch.hvm_vcpu.cache_tsc_offset; > } > > void hvm_migrate_timers(struct vcpu *v)I''m getting the impression that the opt_softtsc functionality got lost here.>+ printk("Migrate to a platform with different freq:%ldMhz, " >+ "expected freq:%dMhz, enable rdtsc exiting!\n", >+ cpu_khz / 1000, hdr->gtsc_khz / 1000);gdprintk()? At least, I think, any guest related printk-s should identify which guest they''re about. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-18 08:52 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration betweenplatforms with different TSC frequecies
Jan Beulich wrote:>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 18.06.09 04:56 >>> >> PATCH 0003-- Scaling host TSC freqeuncy patch. > >> +int hvm_gtsc_need_scale(struct domain *d) >> +{ >> + uint32_t gtsc_khz; >> + >> + gtsc_khz = d->arch.hvm_domain.gtsc_khz / 1000; > > Can the variable please be renamed to what it contains (i.e. > gtsc_mhz)? > >> u64 hvm_get_guest_tsc(struct vcpu *v) >> { >> - u64 host_tsc; >> - >> - if ( opt_softtsc ) >> - host_tsc = hvm_get_guest_time(v); >> - else >> - rdtscll(host_tsc); >> - >> - return host_tsc + v->arch.hvm_vcpu.cache_tsc_offset; >> + u64 host_tsc, scaled_htsc; >> + >> + rdtscll(host_tsc); >> + scaled_htsc = hvm_h2g_scale_tsc(v, host_tsc); >> + >> + return scaled_htsc + v->arch.hvm_vcpu.cache_tsc_offset; } >> >> void hvm_migrate_timers(struct vcpu *v) > > I''m getting the impression that the opt_softtsc functionality got > lost here.I am still confused by opt_softtsc check here. If want to use platform timer to emulate guest''s tsc, hvm_set_guest_tsc should also need perform this check to get correct cache_tsc_offset, but I didn''t see it. A bug ? If use host''s tsc to emulate guest''s tsc, the check is useless, so I removed it in my patch. Maybe we need Dan''s explanation about the check here to determin whether keep it or not.> >> + printk("Migrate to a platform with different freq:%ldMhz, " >> + "expected freq:%dMhz, enable rdtsc exiting!\n", >> + cpu_khz / 1000, hdr->gtsc_khz / 1000); > > gdprintk()? At least, I think, any guest related printk-s should > identify which guest they''re about.Added in the attached patch. Thanks! Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Jun-18 09:02 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
At 03:56 +0100 on 18 Jun (1245297406), Zhang, Xiantao wrote:> --- a/xen/include/public/arch-x86/hvm/save.h Fri Feb 20 17:02:36 2009 +0000 > +++ b/xen/include/public/arch-x86/hvm/save.h Tue Jun 16 22:41:06 2009 -0400 > @@ -38,7 +38,7 @@ struct hvm_save_header { > uint32_t version; /* File format version */ > uint64_t changeset; /* Version of Xen that saved this file */ > uint32_t cpuid; /* CPUID[0x01][%eax] on the saving machine */ > - uint32_t pad0; > + uint32_t gtsc_khz; /* Guest''s TSC frequency in kHz */ > }; > > DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);I''m not sure this is the best place for this field -- it''s a property of the guest CPU rather than of the host that saved the record. I think it would be better to give it its own save record type. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Patrick Colp
2009-Jun-18 09:10 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
>+ printk("Migrate to a platform with different freq:%ldMhz, " >+ "expected freq:%dMhz, enable rdtsc exiting!\n", >+ cpu_khz / 1000, hdr->gtsc_khz / 1000);Being pedantic, this should probably be: printk("Migrate to a platform with different freq: %ldMHz, " "expected freq: %dMHz, enable rdtsc exiting!\n", cpu_khz / 1000, hdr->gtsc_khz / 1000); Patrick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Jun-18 09:27 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
At 10:10 +0100 on 18 Jun (1245319857), Patrick Colp wrote:> >+ printk("Migrate to a platform with different freq:%ldMhz, " > >+ "expected freq:%dMhz, enable rdtsc exiting!\n", > >+ cpu_khz / 1000, hdr->gtsc_khz / 1000); > > Being pedantic, this should probably be: > > printk("Migrate to a platform with different freq: %ldMHz, " > "expected freq: %dMHz, enable rdtsc exiting!\n", > cpu_khz / 1000, hdr->gtsc_khz / 1000);Being _pedantic_, it should be gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " "but CPU is %ldMHz; enabling RDTSC exiting.\n", hdr->gtsc_khz / 1000, cpu_khz / 1000); :) Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-18 09:46 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Tim Deegan wrote:> At 03:56 +0100 on 18 Jun (1245297406), Zhang, Xiantao wrote: >> --- a/xen/include/public/arch-x86/hvm/save.h Fri Feb 20 17:02:36 >> 2009 +0000 +++ b/xen/include/public/arch-x86/hvm/save.h Tue Jun 16 >> 22:41:06 2009 -0400 @@ -38,7 +38,7 @@ struct hvm_save_header { >> uint32_t version; /* File format version */ >> uint64_t changeset; /* Version of Xen that saved this >> file */ uint32_t cpuid; /* CPUID[0x01][%eax] on the >> saving machine */ - uint32_t pad0; + uint32_t gtsc_khz; >> /* Guest''s TSC frequency in kHz */ }; >> >> DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); > > I''m not sure this is the best place for this field -- it''s a property > of the guest CPU rather than of the host that saved the record. I > think it would be better to give it its own save record type.Hi, Tim Guest''s preferred TSC frequency should be per-domain concept instead of constraining it to vcpus. If we introduce a separate save type for this only field, maybe too heavy. In addition, TSC frequency field is saved by original proposal about image format, but don''t know why it is dropped. See attached pdf doc. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-18 09:47 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Tim Deegan wrote:> At 10:10 +0100 on 18 Jun (1245319857), Patrick Colp wrote: >>> + printk("Migrate to a platform with different freq:%ldMhz, " >>> + "expected freq:%dMhz, enable rdtsc exiting!\n", >>> + cpu_khz / 1000, hdr->gtsc_khz / 1000); >> >> Being pedantic, this should probably be: >> >> printk("Migrate to a platform with different freq: %ldMHz, " >> "expected freq: %dMHz, enable rdtsc exiting!\n", >> cpu_khz / 1000, hdr->gtsc_khz / 1000); > > Being _pedantic_, it should be > > gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > hdr->gtsc_khz / 1000, cpu_khz / 1000);It maybe wonderful to add VM info (eg. Domain id) as Jan says in another mail. Thanks for your suggestions! I will change it in the final version! :-) Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Jun-18 09:56 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
At 10:46 +0100 on 18 Jun (1245322016), Zhang, Xiantao wrote:> Hi, Tim > Guest''s preferred TSC frequency should be per-domain concept > instead of constraining it to vcpus. If we introduce a separate save > type for this only field, maybe too heavy.Sorry, what I meant was: this is a property of the _guest_ so should probably have a save record like other guest properties. The cpuid info in the header is a property of the _host_ that saved the record and is there just for sanity-checking the compatibility of the record (like the version number). Cheers, Tim.> In addition, TSC frequency field is saved by original proposal about > image format, but don''t know why it is dropped. See attached pdf doc.-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2009-Jun-18 10:56 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> This patchset targets for enabling TSC scaling in software for live > migration between platforms with different TSC frequecies. Once found the > target host''s frequency is different with source host''s, hypervisor will > trap and emulate guest''s all rdtsc instructions with its expected > frequency. > If hardware''s TSC frequency is difffernt with guest''s exepcted freq, > guest may behave abnormally, eg. incorrect wallclock, soft lockup, even > hang in some cases. Therefore, this patchset is necessary to avoid such > issues. > > PATCH 0001-- Save guest''s preferred TSC in image for save/restore and > migration PATCH 0002-- Move multidiv64 as a library function. > PATCH 0003-- Scaling host TSC freqeuncy patch.I think this needs to be a feature which is enabled/disabled on a per VM basis (in the config file). I''m not sure what the default should be. Windows VMs and applications don''t seem to much care about the TSC which is an argument for leaving the default as it is at the moment. However, one could argue that things that don''t care about the TSC aren''t going to be reading it much, so the overhead of making the default to scale the TSC shouldn''t be too high. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 15:40 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration betweenplatforms with different TSC frequecies
> I am still confused by opt_softtsc check here. If want to use > platform timer to emulate guest''s tsc, hvm_set_guest_tsc > should also need perform this check to get correct > cache_tsc_offset, but I didn''t see it. A bug ? > If use host''s tsc to emulate guest''s tsc, the check is > useless, so I removed it in my patch. Maybe we need Dan''s > explanation about the check here to determin whether keep it or not.Please do keep the opt_softtsc check. I agree that there is a bug, that hvm_set_guest_tsc should check as well. IIRC, my guest never set the TSC. The softtsc option is for handling skew problems not scaling/migration problems but should probably be updated to handle your TSC scaling as well. http://lists.xensource.com/archives/html/xen-devel/2008-07/msg00495.html Thanks, Dan> -----Original Message----- > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > Sent: Thursday, June 18, 2009 2:52 AM > To: Jan Beulich; Dan Magenheimer > Cc: Keir Fraser; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [PATCH] TSC scaling for live migration > betweenplatforms with different TSC frequecies > > > Jan Beulich wrote: > >>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 18.06.09 04:56 >>> > >> PATCH 0003-- Scaling host TSC freqeuncy patch. > > > >> +int hvm_gtsc_need_scale(struct domain *d) > >> +{ > >> + uint32_t gtsc_khz; > >> + > >> + gtsc_khz = d->arch.hvm_domain.gtsc_khz / 1000; > > > > Can the variable please be renamed to what it contains (i.e. > > gtsc_mhz)? > > > >> u64 hvm_get_guest_tsc(struct vcpu *v) > >> { > >> - u64 host_tsc; > >> - > >> - if ( opt_softtsc ) > >> - host_tsc = hvm_get_guest_time(v); > >> - else > >> - rdtscll(host_tsc); > >> - > >> - return host_tsc + v->arch.hvm_vcpu.cache_tsc_offset; > >> + u64 host_tsc, scaled_htsc; > >> + > >> + rdtscll(host_tsc); > >> + scaled_htsc = hvm_h2g_scale_tsc(v, host_tsc); > >> + > >> + return scaled_htsc + v->arch.hvm_vcpu.cache_tsc_offset; } > >> > >> void hvm_migrate_timers(struct vcpu *v) > > > > I''m getting the impression that the opt_softtsc functionality got > > lost here. > > I am still confused by opt_softtsc check here. If want to use > platform timer to emulate guest''s tsc, hvm_set_guest_tsc > should also need perform this check to get correct > cache_tsc_offset, but I didn''t see it. A bug ? > If use host''s tsc to emulate guest''s tsc, the check is > useless, so I removed it in my patch. Maybe we need Dan''s > explanation about the check here to determin whether keep it or not. > > > > >> + printk("Migrate to a platform with different > freq:%ldMhz, " > >> + "expected freq:%dMhz, enable rdtsc exiting!\n", > >> + cpu_khz / 1000, hdr->gtsc_khz / 1000); > > > > gdprintk()? At least, I think, any guest related printk-s should > > identify which guest they''re about. > > Added in the attached patch. Thanks! > Xiantao >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 15:45 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > hdr->gtsc_khz / 1000, cpu_khz / 1000);RDTSC "exiting"? Do you mean RDTSC emulation? Also, frankly, given the potential performance ramifications, perhaps this should be higher than XENLOG_INFO?> -----Original Message----- > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > Sent: Thursday, June 18, 2009 3:47 AM > To: Tim Deegan; Patrick Colp > Cc: Keir@acsinet12.oracle.com; xen-devel@lists.xensource.com; Fraser > Subject: RE: [Xen-devel] [PATCH] TSC scaling for live > migration between > platforms with different TSC frequecies > > > Tim Deegan wrote: > > At 10:10 +0100 on 18 Jun (1245319857), Patrick Colp wrote: > >>> + printk("Migrate to a platform with different > freq:%ldMhz, " > >>> + "expected freq:%dMhz, enable rdtsc exiting!\n", > >>> + cpu_khz / 1000, hdr->gtsc_khz / 1000); > >> > >> Being pedantic, this should probably be: > >> > >> printk("Migrate to a platform with different freq: %ldMHz, " > >> "expected freq: %dMHz, enable rdtsc exiting!\n", > >> cpu_khz / 1000, hdr->gtsc_khz / 1000); > > > > Being _pedantic_, it should be > > > > gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " > > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > > hdr->gtsc_khz / 1000, cpu_khz / 1000); > > It maybe wonderful to add VM info (eg. Domain id) as Jan says > in another mail. Thanks for your suggestions! I will change > it in the final version! :-) > Xiantao > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 15:58 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
This is a real virtualization problem. Some apps may use TSC as a monotonically increasing "sequence number" to mark transactions. For these apps, TSC scaling is irrelevant (as long as the transition doesn''t cause the TSC to stop or go backwards). Other apps (and/or the OS kernel) may use TSC to approximate the passage of time, and for these apps (and gettimeofday in the Linux kernel), this TSC scaling patch is a must. Unfortunately, both kinds of apps could be running simultaneously on the same guest. And in either case, RDTSC frequency may be quite high. I think a key missing fact is the overhead measurement for trapping RDTSC. I think someone at Intel measured this once and it was quite bad. Perhaps a better question is: If it is important to ALWAYS emulate RDTSC, can the Xen code be written to handle RDTSC emulation much faster? If it could be made fast enough, the proper default might best be always emulate (even for PV guests... which also points out the fact that the proposed patch only fixes HVM guests).> -----Original Message----- > From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] > Sent: Thursday, June 18, 2009 4:56 AM > To: Zhang, Xiantao; Keir Fraser > Cc: Ian Pratt; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [PATCH] TSC scaling for live > migration between > platforms with different TSC frequecies > > > > This patchset targets for enabling TSC scaling in > software for live > > migration between platforms with different TSC frequecies. > Once found the > > target host''s frequency is different with source host''s, > hypervisor will > > trap and emulate guest''s all rdtsc instructions with its expected > > frequency. > > If hardware''s TSC frequency is difffernt with guest''s > exepcted freq, > > guest may behave abnormally, eg. incorrect wallclock, soft > lockup, even > > hang in some cases. Therefore, this patchset is necessary > to avoid such > > issues. > > > > PATCH 0001-- Save guest''s preferred TSC in image for > save/restore and > > migration PATCH 0002-- Move multidiv64 as a library function. > > PATCH 0003-- Scaling host TSC freqeuncy patch. > > I think this needs to be a feature which is enabled/disabled > on a per VM basis (in the config file). > > I''m not sure what the default should be. Windows VMs and > applications don''t seem to much care about the TSC which is > an argument for leaving the default as it is at the moment. > However, one could argue that things that don''t care about > the TSC aren''t going to be reading it much, so the overhead > of making the default to scale the TSC shouldn''t be too high. > > Ian > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-Jun-18 16:04 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
At 16:45 +0100 on 18 Jun (1245343507), Dan Magenheimer wrote:> > gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " > > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > > hdr->gtsc_khz / 1000, cpu_khz / 1000); > > RDTSC "exiting"? Do you mean RDTSC emulation?Meh. "RDTSC exiting" is Intel''s term for it in the SDMs.> Also, frankly, given the potential performance ramifications, perhaps > this should be higher than XENLOG_INFO?Not if it''s going to print that out every time I migrate a VM. This whole feature is something I''d turn off anyway. Cheers, Tim.> > -----Original Message----- > > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > > Sent: Thursday, June 18, 2009 3:47 AM > > To: Tim Deegan; Patrick Colp > > Cc: Keir@acsinet12.oracle.com; xen-devel@lists.xensource.com; Fraser > > Subject: RE: [Xen-devel] [PATCH] TSC scaling for live > > migration between > > platforms with different TSC frequecies > > > > > > Tim Deegan wrote: > > > At 10:10 +0100 on 18 Jun (1245319857), Patrick Colp wrote: > > >>> + printk("Migrate to a platform with different > > freq:%ldMhz, " > > >>> + "expected freq:%dMhz, enable rdtsc exiting!\n", > > >>> + cpu_khz / 1000, hdr->gtsc_khz / 1000); > > >> > > >> Being pedantic, this should probably be: > > >> > > >> printk("Migrate to a platform with different freq: %ldMHz, " > > >> "expected freq: %dMHz, enable rdtsc exiting!\n", > > >> cpu_khz / 1000, hdr->gtsc_khz / 1000); > > > > > > Being _pedantic_, it should be > > > > > > gdprintk(XENLOG_INFO, "Loaded VM expects a %"PRIu32"MHz TSC " > > > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > > > hdr->gtsc_khz / 1000, cpu_khz / 1000); > > > > It maybe wonderful to add VM info (eg. Domain id) as Jan says > > in another mail. Thanks for your suggestions! I will change > > it in the final version! :-) > > Xiantao > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-18 16:45 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Thu, Jun 18, 2009 at 08:58:49AM -0700, Dan Magenheimer wrote:> Other apps (and/or the OS kernel) may use TSC to > approximate the passage of time, and for these apps > (and gettimeofday in the Linux kernel), this TSC scaling > patch is a must. Unfortunately, both kinds of apps could > be running simultaneously on the same guest. And > in either case, RDTSC frequency may be quite high.Certainly Solaris relies on the TSC for time-keeping, and uses it very heavily. To the extent that I doubt it''s even feasible to migrate to a machine where scaling needs to be done, and such a migration should be refused, since it would essentially kill the guest.> question is: If it is important to ALWAYS emulate RDTSC, > can the Xen code be written to handle RDTSC emulation > much faster? If it could be made fast enough, theI''d be amazed if this were possible. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 20:07 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> Meh. "RDTSC exiting" is Intel''s term for it in the SDMs.Well Intel''s choice of term in some obscure corner of the SDM doesn''t seem like a good choice when it is misleading to mortals. "Exiting" means "leaving" or "quitting", so mortals probably would read this as that RDTSC ("whatever that is" sez the mere mortal) is shutting down. Not to mention the fact that "enabling XXX exiting" sounds like an oxymoron. :-)> Not if it''s going to print that out every time I > migrate a VM. This whole feature is something I''d > turn off anyway.You''d turn it off at your peril unless you are certain all the potential migration target machines in your data center have identical TSC rates and/or all of your guests don''t have any reliance on TSC for keeping time. The message should only print if you ARE migrating between machines with different TSC rates, not for every migration. Since the consequence of turning it on is a potentially large loss in performance, a clear log message seems a small price to pay. Dan> -----Original Message----- > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > Sent: Thursday, June 18, 2009 10:04 AM > To: Dan Magenheimer > Cc: Patrick Colp; Keir@acsinet12.oracle.com; > xen-devel@lists.xensource.com; Zhang, Xiantao; Keir Fraser > Subject: Re: [Xen-devel] [PATCH] TSC scaling for live > migration between > platforms with different TSC frequecies > > > At 16:45 +0100 on 18 Jun (1245343507), Dan Magenheimer wrote: > > > gdprintk(XENLOG_INFO, "Loaded VM expects a > %"PRIu32"MHz TSC " > > > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > > > hdr->gtsc_khz / 1000, cpu_khz / 1000); > > > > RDTSC "exiting"? Do you mean RDTSC emulation? > > Meh. "RDTSC exiting" is Intel''s term for it in the SDMs. > > > Also, frankly, given the potential performance > ramifications, perhaps > > this should be higher than XENLOG_INFO? > > Not if it''s going to print that out every time I migrate a VM. This > whole feature is something I''d turn off anyway. > > Cheers, > > Tim. > > > > -----Original Message----- > > > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] > > > Sent: Thursday, June 18, 2009 3:47 AM > > > To: Tim Deegan; Patrick Colp > > > Cc: Keir@acsinet12.oracle.com; > xen-devel@lists.xensource.com; Fraser > > > Subject: RE: [Xen-devel] [PATCH] TSC scaling for live > > > migration between > > > platforms with different TSC frequecies > > > > > > > > > Tim Deegan wrote: > > > > At 10:10 +0100 on 18 Jun (1245319857), Patrick Colp wrote: > > > >>> + printk("Migrate to a platform with different > > > freq:%ldMhz, " > > > >>> + "expected freq:%dMhz, enable rdtsc exiting!\n", > > > >>> + cpu_khz / 1000, hdr->gtsc_khz / 1000); > > > >> > > > >> Being pedantic, this should probably be: > > > >> > > > >> printk("Migrate to a platform with different > freq: %ldMHz, " > > > >> "expected freq: %dMHz, enable rdtsc exiting!\n", > > > >> cpu_khz / 1000, hdr->gtsc_khz / 1000); > > > > > > > > Being _pedantic_, it should be > > > > > > > > gdprintk(XENLOG_INFO, "Loaded VM expects a > %"PRIu32"MHz TSC " > > > > "but CPU is %ldMHz; enabling RDTSC exiting.\n", > > > > hdr->gtsc_khz / 1000, cpu_khz / 1000); > > > > > > It maybe wonderful to add VM info (eg. Domain id) as Jan says > > > in another mail. Thanks for your suggestions! I will change > > > it in the final version! :-) > > > Xiantao > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.] > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 20:27 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> > Other apps (and/or the OS kernel) may use TSC to > > approximate the passage of time, and for these apps > > (and gettimeofday in the Linux kernel), this TSC scaling > > patch is a must. Unfortunately, both kinds of apps could > > be running simultaneously on the same guest. And > > in either case, RDTSC frequency may be quite high. > > Certainly Solaris relies on the TSC for time-keeping, and uses it very > heavily. To the extent that I doubt it''s even feasible to migrate to a > machine where scaling needs to be done, and such a migration should be > refused, since it would essentially kill the guest.Hmmm... any numbers? Certainly Solaris isn''t reading TSC much more than a thousand times per second, is it? Are you suggesting that data centers running Solaris guests must segregate sets of their machines by clock rate and disallow migrations between the sets?> > question is: If it is important to ALWAYS emulate RDTSC, > > can the Xen code be written to handle RDTSC emulation > > much faster? If it could be made fast enough, the > > I''d be amazed if this were possible.If it were PA-RISC or Itanium, I''d take on the challenge, but I just don''t know x86 well enough. Are traps really THAT expensive on x86? (If max(TSC/sec/processor)~=1000 and cycles/emulation~=5000, total degradation would be less than 1%. (Sounds high, but if the alternative is clocks going haywire, seems a small price to pay. And I expect the frequency and cost estimates (1000 and 5000) are probably too high.) Also, might turning RDTSC emulation on be much faster on newer processors than old? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-18 20:45 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Thu, Jun 18, 2009 at 01:27:23PM -0700, Dan Magenheimer wrote:> > Certainly Solaris relies on the TSC for time-keeping, and uses it very > > heavily. To the extent that I doubt it''s even feasible to migrate to a > > machine where scaling needs to be done, and such a migration should be > > refused, since it would essentially kill the guest. > > Hmmm... any numbers? Certainly Solaris isn''t reading TSC much# dtrace -n ''fbt::tsc_gethrtime:entry /cpu == 0/ { @ = sum(1); }'' -c "sleep 10" dtrace: description ''fbt::tsc_gethrtime:entry '' matched 1 probe dtrace: pid 29708 has exited 27798 This is on a basically idle 8-way system. (The other CPUs are less busy.) http://blogs.sun.com/eschrock/entry/microstate_accounting_in_solaris_10> that data centers running Solaris guests must segregate sets of > their machines by clock rate and disallow migrations > between the sets?Certainly for Solaris HVM that has to be the case until we make it use PV time (presuming that is safe, which I''m not sure offhand).> THAT expensive on x86? (If max(TSC/sec/processor)~=1000 and > cycles/emulation~=5000, total degradation would be > less than 1%. (Sounds high, but if the alternative isAt the end of the day, though, only testing will tell us for sure. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 20:57 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> > Hmmm... any numbers? Certainly Solaris isn''t reading TSC much > > # dtrace -n ''fbt::tsc_gethrtime:entry /cpu == 0/ { @ = > sum(1); }'' -c "sleep 10" > dtrace: description ''fbt::tsc_gethrtime:entry '' matched 1 probe > dtrace: pid 29708 has exited > > 27798 > > This is on a basically idle 8-way system. (The other CPUs are > less busy.)Just checking... this is in 10 seconds and each processor is "ticking" (and possibly a system-wide timer tick as well), so this is ~350 rdtsc/sec/processor, correct?>> that data centers running Solaris guests must segregate sets of >> their machines by clock rate and disallow migrations >> between the sets?> Certainly for Solaris HVM that has to be the case until we make it use PV time > (presuming that is safe, which I''m not sure offhand).I believe PV is no more safe (and the proposed patch doesn''t work for PV).>> THAT expensive on x86? (If max(TSC/sec/processor)~=1000 and >> cycles/emulation~=5000, total degradation would be >> less than 1%. (Sounds high, but if the alternative is> At the end of the day, though, only testing will tell us for sure.Yes indeed. It would be nice if some x86 wizard could spin up a best case for this and if someone with good hardware measurement tools could compare current vs best (as well as measure true instruction frequency as apps might be rdtsc''ing directly). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-18 21:00 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Thu, Jun 18, 2009 at 01:57:21PM -0700, Dan Magenheimer wrote:> > # dtrace -n ''fbt::tsc_gethrtime:entry /cpu == 0/ { @ = > > sum(1); }'' -c "sleep 10" > > dtrace: description ''fbt::tsc_gethrtime:entry '' matched 1 probe > > dtrace: pid 29708 has exited > > > > 27798 > > > > This is on a basically idle 8-way system. (The other CPUs are > > less busy.) > > Just checking... this is in 10 seconds and each processor is > "ticking" (and possibly a system-wide timer tick as well), > so this is ~350 rdtsc/sec/processor, correct?No. That''s CPU0 only (''cpu == 0''). Solaris only has one system-wide timer tick. This is mstate accounting: every kernel/user boundary, every interrupt, etc. incurs at least one TSC read. (And of course the machine is idle.) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-18 22:27 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> > Just checking... this is in 10 seconds and each processor is > > "ticking" (and possibly a system-wide timer tick as well), > > so this is ~350 rdtsc/sec/processor, correct? > > No. That''s CPU0 only (''cpu == 0''). Solaris only has one system-wide > timer tick. This is mstate accounting: every kernel/user > boundary, every > interrupt, etc. incurs at least one TSC read. (And of course > the machine > is idle.)Wow. (and I repeat for emphasis) Wow. Even when restricted to physical hardware, using the TSC for such purposes seems ill-advised. In a virtual data center, the data will be often useless. Is mstate accounting used for anything other than providing interesting performance data if one cares to look at it? Does mstate accounting ignore negative values for delta TSC? Well, even a few thousand RDTSC/second is not too bad if RDTSC emulation can be brought in under a couple thousand cycles. (I don''t know that it can, but I also don''t know that it can''t.) Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dong, Eddie
2009-Jun-18 23:49 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
>> question is: If it is important to ALWAYS emulate RDTSC, >> can the Xen code be written to handle RDTSC emulation >> much faster? If it could be made fast enough, theThis is in our plan. The perf overhead of sysbench.oltp could be ~10% due to rdtsc exiting. But even with optimization, we still need this software scaling as a fall back.>I''d be amazed if this were possible.We posted the idea several weeks ago. See http://lists.xensource.com/archives/html/xen-devel/2009-04/msg00890.html. Will that impact Solaris? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-19 01:21 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
John Levon wrote:> On Thu, Jun 18, 2009 at 01:57:21PM -0700, Dan Magenheimer wrote: > >>> # dtrace -n ''fbt::tsc_gethrtime:entry /cpu == 0/ { @ >>> sum(1); }'' -c "sleep 10" >>> dtrace: description ''fbt::tsc_gethrtime:entry '' matched 1 probe >>> dtrace: pid 29708 has exited >>> >>> 27798 >>> >>> This is on a basically idle 8-way system. (The other CPUs are >>> less busy.) >> >> Just checking... this is in 10 seconds and each processor is >> "ticking" (and possibly a system-wide timer tick as well), >> so this is ~350 rdtsc/sec/processor, correct? > > No. That''s CPU0 only (''cpu == 0''). Solaris only has one system-wide > timer tick. This is mstate accounting: every kernel/user boundary, > every interrupt, etc. incurs at least one TSC read. (And of course > the machine is idle.)So the rdtsc rate in the system is 2779.8/s per your testing ? If so, the performance impact can be ignored. We had done the performance testing with sysbench oltp, and in the testing the rdtsc rate exceeds 120000 rdtsc/sec, but even in such extreme case perfomrance loss is still less 10%. In addition, we also measured the emulation cost, and the result showes rdtsc can be done in 1500-1800 cycles in emulation case. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-19 01:34 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Ian Pratt wrote:>> This patchset targets for enabling TSC scaling in software for >> live migration between platforms with different TSC frequecies. >> Once found the target host''s frequency is different with source >> host''s, hypervisor will trap and emulate guest''s all rdtsc >> instructions with its expected frequency. If hardware''s TSC >> frequency is difffernt with guest''s exepcted freq, guest may behave >> abnormally, eg. incorrect wallclock, soft lockup, even hang in some >> cases. Therefore, this patchset is necessary to avoid such issues. >> >> PATCH 0001-- Save guest''s preferred TSC in image for save/restore and >> migration PATCH 0002-- Move multidiv64 as a library function. >> PATCH 0003-- Scaling host TSC freqeuncy patch. > > I think this needs to be a feature which is enabled/disabled on a per > VM basis (in the config file). > > I''m not sure what the default should be. Windows VMs and applications > don''t seem to much care about the TSC which is an argument for > leaving the default as it is at the moment. However, one could argue > that things that don''t care about the TSC aren''t going to be reading > it much, so the overhead of making the default to scale the TSC > shouldn''t be too high.Hi, Ian We also considered to add an option to disable/enable this feature. But you know, this logic is only efffective after migration, but option should be determined in guest creation stage, so it maybe useless for most of cases. Even if windows system doesn''t care much about tsc, but we can''t say applications don''t use it at all. I assume Windows''s timing mechanism also need build the relationship between tsc and time source, and if the assumption is right, guest''s applications may behave abnoramlly in a different tsc frequency. Thanks! :-) Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-19 01:48 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration betweenplatforms with different TSC frequecies
Dan Magenheimer wrote:>> I am still confused by opt_softtsc check here. If want to use >> platform timer to emulate guest''s tsc, hvm_set_guest_tsc >> should also need perform this check to get correct >> cache_tsc_offset, but I didn''t see it. A bug ? >> If use host''s tsc to emulate guest''s tsc, the check is >> useless, so I removed it in my patch. Maybe we need Dan''s >> explanation about the check here to determin whether keep it or not. > > Please do keep the opt_softtsc check. I agree that there > is a bug, that hvm_set_guest_tsc should check as well. > IIRC, my guest never set the TSC.Okay, for gerneral cases, we also need the check for opt_softtsc in hvm_set_guest_tsc, otherwise system may confuse its TSC after its setting tsc operation.> The softtsc option is for handling skew problems > not scaling/migration problems but should probably > be updated to handle your TSC scaling as well.Okay, I will add the corresponding logic to handle it. A question, according to the logic, the tsc frequech should be equal 10^9 once opt_softtsc is set, right ?> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg00495.html > > Thanks, > Dan > >> -----Original Message----- >> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com] >> Sent: Thursday, June 18, 2009 2:52 AM >> To: Jan Beulich; Dan Magenheimer >> Cc: Keir Fraser; xen-devel@lists.xensource.com >> Subject: RE: [Xen-devel] [PATCH] TSC scaling for live migration >> betweenplatforms with different TSC frequecies >> >> >> Jan Beulich wrote: >>>>>> "Zhang, Xiantao" <xiantao.zhang@intel.com> 18.06.09 04:56 >>> >>>> PATCH 0003-- Scaling host TSC freqeuncy patch. >>> >>>> +int hvm_gtsc_need_scale(struct domain *d) >>>> +{ >>>> + uint32_t gtsc_khz; >>>> + >>>> + gtsc_khz = d->arch.hvm_domain.gtsc_khz / 1000; >>> >>> Can the variable please be renamed to what it contains (i.e. >>> gtsc_mhz)? >>> >>>> u64 hvm_get_guest_tsc(struct vcpu *v) >>>> { >>>> - u64 host_tsc; >>>> - >>>> - if ( opt_softtsc ) >>>> - host_tsc = hvm_get_guest_time(v); >>>> - else >>>> - rdtscll(host_tsc); >>>> - >>>> - return host_tsc + v->arch.hvm_vcpu.cache_tsc_offset; >>>> + u64 host_tsc, scaled_htsc; >>>> + >>>> + rdtscll(host_tsc); >>>> + scaled_htsc = hvm_h2g_scale_tsc(v, host_tsc); >>>> + >>>> + return scaled_htsc + v->arch.hvm_vcpu.cache_tsc_offset; } >>>> >>>> void hvm_migrate_timers(struct vcpu *v) >>> >>> I''m getting the impression that the opt_softtsc functionality got >>> lost here. >> >> I am still confused by opt_softtsc check here. If want to use >> platform timer to emulate guest''s tsc, hvm_set_guest_tsc >> should also need perform this check to get correct >> cache_tsc_offset, but I didn''t see it. A bug ? >> If use host''s tsc to emulate guest''s tsc, the check is >> useless, so I removed it in my patch. Maybe we need Dan''s >> explanation about the check here to determin whether keep it or not. >> >>> >>>> + printk("Migrate to a platform with different freq:%ldMhz, >>>> " + "expected freq:%dMhz, enable rdtsc exiting!\n", >>>> + cpu_khz / 1000, hdr->gtsc_khz / 1000); >>> >>> gdprintk()? At least, I think, any guest related printk-s should >>> identify which guest they''re about. >> >> Added in the attached patch. Thanks! >> Xiantao_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-19 02:25 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
John Levon wrote:> On Thu, Jun 18, 2009 at 08:58:49AM -0700, Dan Magenheimer wrote: > >> Other apps (and/or the OS kernel) may use TSC to >> approximate the passage of time, and for these apps >> (and gettimeofday in the Linux kernel), this TSC scaling >> patch is a must. Unfortunately, both kinds of apps could >> be running simultaneously on the same guest. And >> in either case, RDTSC frequency may be quite high. > > Certainly Solaris relies on the TSC for time-keeping, and uses it very > heavily. To the extent that I doubt it''s even feasible to migrate to a > machine where scaling needs to be done, and such a migration should be > refused, since it would essentially kill the guest. > >> question is: If it is important to ALWAYS emulate RDTSC, >> can the Xen code be written to handle RDTSC emulation >> much faster? If it could be made fast enough, the > > I''d be amazed if this were possible.Actually, we had done some optimizations and make most of rdtsc not trap to hypervisor and always keep tsc monotonically increasing in each virtual processor, but it is hard to solve the tsc drift issue between all vcpus. In our solution, tsc drift may exceeds 100000 cycles. You know, TSC drift maybe also exist in real platforms, but maybe not quite large like 1000000 cycles. Do you know what''s the limit of tsc drift OS or applications can bear? At least, Linux doesn''t care about the drift much, but don''t have no idea about applicatoin''s behavior for such large drift. Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-19 13:36 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Thu, Jun 18, 2009 at 03:27:54PM -0700, Dan Magenheimer wrote:> Even when restricted to physical hardware, using the TSC > for such purposes seems ill-advised.In practice it''s not so bad, if you only do power management on P-state invariant TSC CPUs, and disable C1 clock ramping. I''m sure there are all sorts of fun caveats, but I don''t think we''ve had many practical problems.> In a virtual data center, the data will be often useless.It won''t be happy across different machines indeed. We haven''t retested past 3.1, but the PV timer isn''t even monotonic in SMP guests. We have to global-sync to get one. You mentioned the PV timer can''t handle migration - why doesn''t tsc_to_system_mul account for it? If ever a subsystem badly needed a detailed write-up...> Is mstate accounting used for anything other than providing > interesting performance data if one cares to look at it?You make it sounds like that isn''t critically important :)> Does mstate accounting ignore negative values for delta TSC?No, the system time is assumed monotonic (it''s not in TSC units). The TSC code (http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c) is expected to provide monotonicity across all CPUs. And /that/ code assumes there''s no inter-CPU drift. Deltas are allowed, but of course HVM assumes that Xen has dealt with that (since it can''t possible compute deltas between VCPUs). All this stuff is painful. What I wouldn''t give for a single cheap monotonic timer source that worked under all circumstances. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-19 13:53 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Fri, Jun 19, 2009 at 10:25:36AM +0800, Zhang, Xiantao wrote:> Actually, we had done some optimizations and make most of rdtsc not > trap to hypervisor and always keep tsc monotonically increasing in > each virtual processor, but it is hard to solve the tsc drift issue > between all vcpus. In our solution, tsc drift may exceeds 100000 > cycles. You know, TSC drift maybe also exist in real platforms, but > maybe not quite large like 1000000 cycles. Do you know what''s theI''m not sure what you mean by "drift of 1000000 cycles". In what time period? Or are you talking about skew (a constant difference between TSC values read between CPUs) ? Solaris handles arbitrary skew (I think) but that''s only accounted for at boot time. A fudge factor (tsc_max_delta) accounts for any minor post-calibration deltas. On HVM or VMWare we don''t even try, since we can''t possibly know the real CPUs skew: the assumption is the VM platform has already done this for us. And at least Xen attempts to sync up the physical CPUs. Significant drift (where different CPUs are ticking at different rates) is bad news, and can easily lead to non-monotonicity. I don''t know what "significant" means though, unfortunately. Finally, a change across all CPUs in the tsc tick rate (so no drift, but a sudden change after, say, a migration) is also bad news. Solaris used to recalibrate the scale rate once a second, but that was removed. All this is HVM/metal code of course. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2009-Jun-19 13:54 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
On Fri, Jun 19, 2009 at 09:21:55AM +0800, Zhang, Xiantao wrote:> > No. That''s CPU0 only (''cpu == 0''). Solaris only has one system-wide > > timer tick. This is mstate accounting: every kernel/user boundary, > > every interrupt, etc. incurs at least one TSC read. (And of course > > the machine is idle.) > > So the rdtsc rate in the system is 2779.8/s per your testing ?No the rdtsc rate on a single CPU on an idle system on one mache was around that :)> If so, the performance impact can be ignored. We had done the > performance testing with sysbench oltp, and in the testing the rdtsc > rate exceeds 120000 rdtsc/sec, but even in such extreme case > perfomrance loss is still less 10%. In addition, we also measured the > emulation cost, and the result showes rdtsc can be done in 1500-1800 > cycles in emulation case.It would be really good to see some Solaris perf results. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-19 15:07 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
John Levon wrote:> On Fri, Jun 19, 2009 at 10:25:36AM +0800, Zhang, Xiantao wrote: > >> Actually, we had done some optimizations and make most of rdtsc not >> trap to hypervisor and always keep tsc monotonically increasing in >> each virtual processor, but it is hard to solve the tsc drift issue >> between all vcpus. In our solution, tsc drift may exceeds 100000 >> cycles. You know, TSC drift maybe also exist in real platforms, but >> maybe not quite large like 1000000 cycles. Do you know what''s the > > I''m not sure what you mean by "drift of 1000000 cycles". In what time > period? Or are you talking about skew (a constant difference between > TSC values read between CPUs) ? Solaris handles arbitrary skew (I > think) but that''s only accounted for at boot time. A fudge factor > (tsc_max_delta) accounts for any minor post-calibration deltas.I mean inter-cpu TSC drift at any time. In our solution, we always keep guest''s TSC across all vcpus synced to its expected TSC in 1 ms or 0.5 ms. That is to say, all vcpus'' TSC monotonically increases, but may generate little drift(10^5 ~10^6 cycles) due to unsync vm exits of vcpus.> On HVM or VMWare we don''t even try, since we can''t possibly know the > real CPUs skew: the assumption is the VM platform has already done > this for us. And at least Xen attempts to sync up the physical CPUs. > Significant drift (where different CPUs are ticking at different > rates) is bad news, and can easily lead to non-monotonicity. I don''t > know what "significant" means though, unfortunately.We can guanrantee each vcpu''s TSC is increasing monotonically, but there maybe some diff between vcpus. I am not sure 10^5 cycles is significant, but it should exceed a stable hardware''s drift in general.> Finally, a change across all CPUs in the tsc tick rate (so no drift, > but a sudden change after, say, a migration) is also bad news. > Solaris used to recalibrate the scale rate once a second, but that > was removed.Bad news.> All this is HVM/metal code of course. > > regards > john_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Jun-19 20:44 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
> > On HVM or VMWare we don''t even try, since we can''t possibly know the > > real CPUs skew: the assumption is the VM platform has already done > > this for us. And at least Xen attempts to sync up the physical CPUs. > > Significant drift (where different CPUs are ticking at different > > rates) is bad news, and can easily lead to non-monotonicity. I don''t > > know what "significant" means though, unfortunately. > > We can guanrantee each vcpu''s TSC is increasing > monotonically, but there maybe some diff between vcpus. I am > not sure 10^5 cycles is significant, but it should exceed a > stable hardware''s drift in general.Let me attempt to define "significant": Assume that two kernel- or user-threads are able to synchronize such that they can guarantee execution order. If: 1) thread A reads TSC, and then 2) thread A and thread B sync to guarantee ordering, and then 3) thread B reads TSC, but 4) thread B''s TSC value is less than thread A''s TSC value then the TSC skew is "significant". If thread A and thread B are for example using TSC values to timestamp journal transactions, then transaction guarantees will not be valid. So the question becomes: What is the smallest number of cycles that are required to allow thread A and thread B to synchronize for ordering? I assert that this value is low enough _in theory_ that only full TSC emulation can guarantee the proper result. In _practice_, I don''t know. But I suspect that it is much lower than 10^5 cycles. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-22 01:38 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Dan Magenheimer wrote:>>> On HVM or VMWare we don''t even try, since we can''t possibly know the >>> real CPUs skew: the assumption is the VM platform has already done >>> this for us. And at least Xen attempts to sync up the physical CPUs. >>> Significant drift (where different CPUs are ticking at different >>> rates) is bad news, and can easily lead to non-monotonicity. I don''t >>> know what "significant" means though, unfortunately. >> >> We can guanrantee each vcpu''s TSC is increasing >> monotonically, but there maybe some diff between vcpus. I am >> not sure 10^5 cycles is significant, but it should exceed a >> stable hardware''s drift in general. > > Let me attempt to define "significant": > > Assume that two kernel- or user-threads are able to synchronize > such that they can guarantee execution order. If: > > 1) thread A reads TSC, and then > 2) thread A and thread B sync to guarantee ordering, and then > 3) thread B reads TSC, but > 4) thread B''s TSC value is less than thread A''s TSC value > > then the TSC skew is "significant". > > If thread A and thread B are for example using TSC values > to timestamp journal transactions, then transaction guarantees > will not be valid.Agree.> So the question becomes: What is the smallest number of > cycles that are required to allow thread A and thread B > to synchronize for ordering?This is the key point to determin whether we can perform furture optimization. If the skew between vcpus can''t be ignored, we should have no fast way to handle it and have to resort to TSC emulationand suffer the performance loss. But anyway, TSC emualtion method should be the first step to go.> I assert that this value is low enough _in theory_ that > only full TSC emulation can guarantee the proper result. > In _practice_, I don''t know. But I suspect that > it is much lower than 10^5 cycles._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-22 05:14 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Hi, Keir This is the new version which has addressed the comments from the mailing list. Please review it again. Thanks! Xiantao Zhang, Xiantao wrote:> Hi, Keir > > This patchset targets for enabling TSC scaling in software for > live migration between platforms with different TSC frequecies. > Once found the target host''s frequency is different with source > host''s, hypervisor will trap and emulate guest''s all rdtsc > instructions with its expected frequency. If hardware''s TSC frequency > is difffernt with guest''s exepcted freq, guest may behave abnormally, > eg. incorrect wallclock, soft lockup, even hang in some cases. > Therefore, this patchset is necessary to avoid such issues. > > PATCH 0001-- Save guest''s preferred TSC in image for save/restore and > migration > PATCH 0002-- Move multidiv64 as a library function. > PATCH 0003-- Scaling host TSC freqeuncy patch. > > Signed-off-by Xiantao Zhang <xiantao.zhang@intel.com> > Xiantao_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jun-23 10:18 UTC
Re: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Stuffing the guest freq in a save-image pad field is not backward compatible. Old images will not have that field filled in and you''ll probably end up doing something stupid like give them a zero-hertz TSC. Please think about backward compatibility and use a separate save record. Like Tim asked you to do already. -- Keir On 22/06/2009 06:14, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:> Hi, Keir > This is the new version which has addressed the comments from the mailing > list. Please review it again. Thanks! > Xiantao > > Zhang, Xiantao wrote: >> Hi, Keir >> >> This patchset targets for enabling TSC scaling in software for >> live migration between platforms with different TSC frequecies. >> Once found the target host''s frequency is different with source >> host''s, hypervisor will trap and emulate guest''s all rdtsc >> instructions with its expected frequency. If hardware''s TSC frequency >> is difffernt with guest''s exepcted freq, guest may behave abnormally, >> eg. incorrect wallclock, soft lockup, even hang in some cases. >> Therefore, this patchset is necessary to avoid such issues. >> >> PATCH 0001-- Save guest''s preferred TSC in image for save/restore and >> migration >> PATCH 0002-- Move multidiv64 as a library function. >> PATCH 0003-- Scaling host TSC freqeuncy patch. >> >> Signed-off-by Xiantao Zhang <xiantao.zhang@intel.com> >> Xiantao >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Jun-24 01:18 UTC
RE: [Xen-devel] [PATCH] TSC scaling for live migration between platforms with different TSC frequecies
Keir Fraser wrote:> Stuffing the guest freq in a save-image pad field is not backward > compatible. Old images will not have that field filled in and you''ll > probably end up doing something stupid like give them a zero-hertz > TSC. Please think about backward compatibility and use a separateHi, Keir I also checked the filed to solve the backward compatibility issue, and once found the field is zero, we won''t do anything about TSC scaling(reference hvm_gtsc_need_scale to get the detail), so guest never uses a zero-hertz frequency in any case. You know, since old images can''t provide TSC frequency info, so TSC scaling logic shouldn''t cover it. Xiantao> -- Keir > > On 22/06/2009 06:14, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > >> Hi, Keir >> This is the new version which has addressed the comments from the >> mailing list. Please review it again. Thanks! >> Xiantao >> >> Zhang, Xiantao wrote: >>> Hi, Keir >>> >>> This patchset targets for enabling TSC scaling in software for >>> live migration between platforms with different TSC frequecies. >>> Once found the target host''s frequency is different with source >>> host''s, hypervisor will trap and emulate guest''s all rdtsc >>> instructions with its expected frequency. If hardware''s TSC >>> frequency is difffernt with guest''s exepcted freq, guest may behave >>> abnormally, eg. incorrect wallclock, soft lockup, even hang in some >>> cases. Therefore, this patchset is necessary to avoid such issues. >>> >>> PATCH 0001-- Save guest''s preferred TSC in image for save/restore >>> and migration PATCH 0002-- Move multidiv64 as a library function. >>> PATCH 0003-- Scaling host TSC freqeuncy patch. >>> >>> Signed-off-by Xiantao Zhang <xiantao.zhang@intel.com> >>> Xiantao_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel