I thought I should bring this up now rather than wait until the problem causes problems for some future customer... Much of the TSC-based time infrastructure in Xen, especially as exposed to guests, is rather sensitive to sudden dramatic differences in TSC values between physical processors. Hot-add of physical CPUs will introduce a huge difference. Current code attempts to "fix up" TSC after discontinuity events (such as C3-state) but this works poorly (far too imprecise), so all guest TSC uses are always emulated on any physical machine that might have such discontinuities. Even though a very small percentage of real-world machines will be capable of hot-cpu-add -- and an even smaller percentage of real-world machines will ever actually hot-add a cpu -- Xen 4.0 currently detects and allows this operation. As a result, it is currently impossible to predict if a hot-add might happen and result in a TSC discontinuity. Some possible solutions: 1) Always emulate all TSC uses for all guests because there is a (microscopic?) probability that a physical hot-cpu-add event might occur 2) Disable hot-cpu-add by default in Xen and require a Xen boot parameter to be specified if a machine might possibly do a hot-cpu-add. If this boot parameter is specified, see (1) above. 3) Do a very precise TSC fixup on a hot-add cpu before it is recognized by Xen as present. (Not sure if this is possible.) 4) Dynamically switch to TSC emulation on all guests when a hot-cpu-add event occurs. (Problem: Under certain circumstances, the Invariant TSC bit is enabled for some guests which maximizes performance on newer Linux kernels. This choice would require Invariant TSC to always be zero.) Thoughts? (My favorite is (2)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/05/2010 16:19, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Much of the TSC-based time infrastructure in Xen, > especially as exposed to guests, is rather sensitive > to sudden dramatic differences in TSC values between > physical processors. Hot-add of physical CPUs will > introduce a huge difference.True at the moment, but can we not just whack the TSC of the newly added CPU on the head when it is brought online, to match the boot CPU? I think that would suffice for systems with ''reliable tsc'' which are the only ones we don''t emulate tsc by default? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > > On 26/05/2010 16:19, "Dan Magenheimer" <dan.magenheimer@oracle.com> > wrote: > > > Much of the TSC-based time infrastructure in Xen, > > especially as exposed to guests, is rather sensitive > > to sudden dramatic differences in TSC values between > > physical processors. Hot-add of physical CPUs will > > introduce a huge difference. > > True at the moment, but can we not just whack the TSC of the newly > added CPU on the head when it is brought online, to match the > boot CPU?Possibly... but the code for whacking the TSC of a CPU after C3-state results in a TSC value that is poorly-aligned with other running TSCs. If there is a better way for "whacking" that results in a nearly-perfectly-aligned TSC (that would pass a "tsc warp test"), that is an option.> I think that would suffice for systems with ''reliable tsc'' > which are the only ones we don''t emulate tsc by default?Yes, I''m particularly concerned with hot-add-physical-cpu on any latest generation QPI/HT boxes where Invariant TSC is set. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Possibly... but the code for whacking the TSC of a CPU after > C3-state results in a TSC value that is poorly-aligned with other > running TSCs. If there is a better way for "whacking" that > results in a nearly-perfectly-aligned TSC (that would pass > a "tsc warp test"), that is an option.It ought to be possible to enhance the "whacking" code to set the TSC based on the topologically nearest live CPU and then sanity-check against all others, repeating a few times to protect against SMIs etc. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/05/2010 17:44, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> True at the moment, but can we not just whack the TSC of the newly >> added CPU on the head when it is brought online, to match the >> boot CPU? > > Possibly... but the code for whacking the TSC of a CPU after > C3-state results in a TSC value that is poorly-aligned with other > running TSCs. If there is a better way for "whacking" that > results in a nearly-perfectly-aligned TSC (that would pass > a "tsc warp test"), that is an option.But what we do in 4.0 is whack all the TSCs at boot time... How is this any different? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > > On 26/05/2010 17:44, "Dan Magenheimer" <dan.magenheimer@oracle.com> > wrote: > > >> True at the moment, but can we not just whack the TSC of the newly > >> added CPU on the head when it is brought online, to match the > >> boot CPU? > > > > Possibly... but the code for whacking the TSC of a CPU after > > C3-state results in a TSC value that is poorly-aligned with other > > running TSCs. If there is a better way for "whacking" that > > results in a nearly-perfectly-aligned TSC (that would pass > > a "tsc warp test"), that is an option. > > But what we do in 4.0 is whack all the TSCs at boot time... How is this > any different?I don''t think we do that anymore, at least not when the underlying machine is deemed to have a stable TSC (Invariant TSC or constant/nonstop TSC with max_cstate<3).> From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] > Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > > > Possibly... but the code for whacking the TSC of a CPU after > > C3-state results in a TSC value that is poorly-aligned with other > > running TSCs. If there is a better way for "whacking" that > > results in a nearly-perfectly-aligned TSC (that would pass > > a "tsc warp test"), that is an option. > > It ought to be possible to enhance the "whacking" code to set the TSC > based on the topologically nearest live CPU and then sanity-check > against all others, repeating a few times to protect against SMIs etc.Well obviously firmware can do it pre-boot, but I don''t know what the impact of the mechanism is on running cpu''s. I''d assume that at least all guest activity would have to be stopped for some not-so-short period (~10-100msec?) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/05/2010 18:39, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> But what we do in 4.0 is whack all the TSCs at boot time... How is this >> any different? > > I don''t think we do that anymore, at least not when the > underlying machine is deemed to have a stable TSC > (Invariant TSC or constant/nonstop TSC with max_cstate<3).Hm, yep, looks like we skip it when we detect ''reliable tscs''. So on modern x86 that means we do not modify TSCs in Xen ever. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/05/2010 18:39, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> It ought to be possible to enhance the "whacking" code to set the TSC >> based on the topologically nearest live CPU and then sanity-check >> against all others, repeating a few times to protect against SMIs etc. > > Well obviously firmware can do it pre-boot, but I don''t know > what the impact of the mechanism is on running cpu''s. I''d assume > that at least all guest activity would have to be stopped for > some not-so-short period (~10-100msec?)It depends how physical CPU hotplug is implemented doesn''t it. I expect there''s sufficient firmware involved in such an operation that TSCs could get synced up before host software gets a look in. I don''t think we can comment on whether or not there is an issue here without more information. Also, one reason Intel pushed the CPU hotplug logic is for RAS, and offlining CPUs that throw errors, which can clearly be supported with no concerns over TSC sync. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > > > Well obviously firmware can do it pre-boot, but I don''t know > > what the impact of the mechanism is on running cpu''s. I''d assume > > that at least all guest activity would have to be stopped for > > some not-so-short period (~10-100msec?) > > It depends how physical CPU hotplug is implemented doesn''t it. I expect > there''s sufficient firmware involved in such an operation that TSCs > could > get synced up before host software gets a look in. I don''t think we can > comment on whether or not there is an issue here without more > information. > Also, one reason Intel pushed the CPU hotplug logic is for RAS, and > offlining CPUs that throw errors, which can clearly be supported with > no concerns over TSC sync.OK, then would you accept a patch that disables physical cpu-hot-add (but not delete) unless enabled with a boot option, if the patch includes sufficient commenting and dmesg to explain the ramifications? Then in future if it turns out that TSC syncing is mostly always handled by firmware (doubtful but possible), the default for the boot option can be reversed. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26/05/2010 20:46, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> It depends how physical CPU hotplug is implemented doesn''t it. I expect >> there''s sufficient firmware involved in such an operation that TSCs >> could >> get synced up before host software gets a look in. I don''t think we can >> comment on whether or not there is an issue here without more >> information. >> Also, one reason Intel pushed the CPU hotplug logic is for RAS, and >> offlining CPUs that throw errors, which can clearly be supported with >> no concerns over TSC sync. > > OK, then would you accept a patch that disables physical cpu-hot-add > (but not delete) unless enabled with a boot option, if the patch > includes sufficient commenting and dmesg to explain the ramifications? > Then in future if it turns out that TSC syncing is mostly always > handled by firmware (doubtful but possible), the default for the > boot option can be reversed.If the patch will be acked by the Intel authors of the cpu hotplug stuff then yes. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >Sent: Thursday, May 27, 2010 5:26 AM >To: Dan Magenheimer; Xen-Devel (xen-devel@lists.xensource.com); Ian Pratt >Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >On 26/05/2010 20:46, "Dan Magenheimer" <dan.magenheimer@oracle.com> >wrote: > >>> It depends how physical CPU hotplug is implemented doesn''t it. I expect >>> there''s sufficient firmware involved in such an operation that TSCs >>> could >>> get synced up before host software gets a look in. I don''t think we can >>> comment on whether or not there is an issue here without more >>> information.Yes, this is a issue. The TSC will not be synched by firmware when hot-added, at least I didn''t find any spec on this, and my experiment shows the TSC value is very small when new CPU is brought up. We need sync it in Xen side, Is it possible to sync the new-added CPU with the BSP when the CPU is added (changed from non-present to present), as Keir suggested in previous mail? I will have a look on the related code. --jyh>>> Also, one reason Intel pushed the CPU hotplug logic is for RAS, and >>> offlining CPUs that throw errors, which can clearly be supported with >>> no concerns over TSC sync. >> >> OK, then would you accept a patch that disables physical cpu-hot-add >> (but not delete) unless enabled with a boot option, if the patch >> includes sufficient commenting and dmesg to explain the ramifications? >> Then in future if it turns out that TSC syncing is mostly always >> handled by firmware (doubtful but possible), the default for the >> boot option can be reversed. > >If the patch will be acked by the Intel authors of the cpu hotplug stuff >then yes. > > -- Keir > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/05/2010 07:15, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>>>> It depends how physical CPU hotplug is implemented doesn''t it. I expect >>>> there''s sufficient firmware involved in such an operation that TSCs >>>> could >>>> get synced up before host software gets a look in. I don''t think we can >>>> comment on whether or not there is an issue here without more >>>> information. > > Yes, this is a issue. The TSC will not be synched by firmware when hot-added, > at least I didn''t find any spec on this, and my experiment shows the TSC value > is very small when new CPU is brought up. We need sync it in Xen side, > > Is it possible to sync the new-added CPU with the BSP when the CPU is added > (changed from non-present to present), as Keir suggested in previous mail? I > will have a look on the related code.Is this *specifically* a problem for physical cpu hot-add, but not ''logical'' cpu online (i.e, XENPF_cpu_hotadd but not XENPF_cpu_online)? We could sync an AP''s TSC with the master CPU bringing it up (typically CPU0) if (a) !boot_cpu_has(RELIABLE_TSC); or (b) The slave was introduced via XENPF_cpu_hotadd and this is its first time brought online. Thoughts? I can implement this, or whatever we can (attempt to) agree on, easily enough. I expect Dan would prefer to have XENPF_cpu_hotadd disabled, or RELIABLE_TSC disabled, depending on a command-line option defaulting to the former. It seems a bit onerous to me however. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, May 27, 2010 2:58 PM >To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen-devel@lists.xensource.com); >Ian Pratt >Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >On 27/05/2010 07:15, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>>>> It depends how physical CPU hotplug is implemented doesn''t it. I expect >>>>> there''s sufficient firmware involved in such an operation that TSCs >>>>> could >>>>> get synced up before host software gets a look in. I don''t think we can >>>>> comment on whether or not there is an issue here without more >>>>> information. >> >> Yes, this is a issue. The TSC will not be synched by firmware when hot-added, >> at least I didn''t find any spec on this, and my experiment shows the TSC value >> is very small when new CPU is brought up. We need sync it in Xen side, >> >> Is it possible to sync the new-added CPU with the BSP when the CPU is added >> (changed from non-present to present), as Keir suggested in previous mail? I >> will have a look on the related code. > >Is this *specifically* a problem for physical cpu hot-add, but not ''logical'' >cpu online (i.e, XENPF_cpu_hotadd but not XENPF_cpu_online)?Yes, I do think so, if the CPU support invariant TSC. For those CPU that does not support invariant TSC, I think current redezvous calibration code has do that already, right?> >We could sync an AP''s TSC with the master CPU bringing it up (typically >CPU0) if (a) !boot_cpu_has(RELIABLE_TSC); or (b) The slave was introduced >via XENPF_cpu_hotadd and this is its first time brought online.Yes, exactly.> >Thoughts? I can implement this, or whatever we can (attempt to) agree on,It''s great if you can do that. I''m still checking the time related code.>easily enough. I expect Dan would prefer to have XENPF_cpu_hotadd disabled, >or RELIABLE_TSC disabled, depending on a command-line option defaulting to >the former. It seems a bit onerous to me however.Yes, thanks. BTW, are there any easy way to check the TSC skew in the system? Originally I get the TSC through ITP, that''s not so convenient.> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Jiang, Yunhong >Sent: Thursday, May 27, 2010 3:10 PM >To: Keir Fraser; Dan Magenheimer; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > > > >>-----Original Message----- >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Thursday, May 27, 2010 2:58 PM >>To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel >(xen-devel@lists.xensource.com); >>Ian Pratt >>Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC >> >>On 27/05/2010 07:15, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>>>>> It depends how physical CPU hotplug is implemented doesn''t it. I expect >>>>>> there''s sufficient firmware involved in such an operation that TSCs >>>>>> could >>>>>> get synced up before host software gets a look in. I don''t think we can >>>>>> comment on whether or not there is an issue here without more >>>>>> information. >>> >>> Yes, this is a issue. The TSC will not be synched by firmware when hot-added, >>> at least I didn''t find any spec on this, and my experiment shows the TSC value >>> is very small when new CPU is brought up. We need sync it in Xen side, >>> >>> Is it possible to sync the new-added CPU with the BSP when the CPU is added >>> (changed from non-present to present), as Keir suggested in previous mail? I >>> will have a look on the related code. >> >>Is this *specifically* a problem for physical cpu hot-add, but not ''logical'' >>cpu online (i.e, XENPF_cpu_hotadd but not XENPF_cpu_online)? > >Yes, I do think so, if the CPU support invariant TSC. For those CPU that does not >support invariant TSC, I think current redezvous calibration code has do that already, >right? > >> >>We could sync an AP''s TSC with the master CPU bringing it up (typically >>CPU0) if (a) !boot_cpu_has(RELIABLE_TSC); or (b) The slave was introduced >>via XENPF_cpu_hotadd and this is its first time brought online. > >Yes, exactly. > >> >>Thoughts? I can implement this, or whatever we can (attempt to) agree on, > >It''s great if you can do that. I''m still checking the time related code. > >>easily enough. I expect Dan would prefer to have XENPF_cpu_hotadd disabled, >>or RELIABLE_TSC disabled, depending on a command-line option defaulting to >>the former. It seems a bit onerous to me however. > >Yes, thanks. >BTW, are there any easy way to check the TSC skew in the system? Originally I get >the TSC through ITP, that''s not so convenient.Just found the "t" debug key. Will try it. --jyh> >> >> -- Keir >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/05/2010 07:15, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>>>> get synced up before host software gets a look in. I don''t think we can >>>> comment on whether or not there is an issue here without more >>>> information. > > Yes, this is a issue. The TSC will not be synched by firmware when hot-added, > at least I didn''t find any spec on this, and my experiment shows the TSC value > is very small when new CPU is brought up. We need sync it in Xen side, > > Is it possible to sync the new-added CPU with the BSP when the CPU is added > (changed from non-present to present), as Keir suggested in previous mail? I > will have a look on the related code.I implemented this as xen-unstable:21468. This represents a strict improvement on what was in xen-unstable before that (no tsc sync at all, ever, because I deleted it about a week ago). Open to further improvements, if we can get consensus. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, May 27, 2010 4:44 PM >To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen-devel@lists.xensource.com); >Ian Pratt >Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >On 27/05/2010 07:15, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>>>> get synced up before host software gets a look in. I don''t think we can >>>>> comment on whether or not there is an issue here without more >>>>> information. >> >> Yes, this is a issue. The TSC will not be synched by firmware when hot-added, >> at least I didn''t find any spec on this, and my experiment shows the TSC value >> is very small when new CPU is brought up. We need sync it in Xen side, >> >> Is it possible to sync the new-added CPU with the BSP when the CPU is added >> (changed from non-present to present), as Keir suggested in previous mail? I >> will have a look on the related code. > >I implemented this as xen-unstable:21468. This represents a strict >improvement on what was in xen-unstable before that (no tsc sync at all, >ever, because I deleted it about a week ago). Open to further improvements, >if we can get consensus.Thanks for the patch. I will get the system that support CPU hotplug tomorrow morning, I can try it then. --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > > >I implemented this as xen-unstable:21468. This represents a strict > >improvement on what was in xen-unstable before that (no tsc sync at > all, > >ever, because I deleted it about a week ago). Open to further > improvements, > >if we can get consensus. > > Thanks for the patch. I will get the system that support CPU hotplug > tomorrow morning, I can try it then. > > --jyhHmmm... I''d be very surprised if this works ever, let alone always across all hardware. By "work", I mean that it will result in an "undetectably small difference" (less than a cache bounce), as defined and measured by the tsc warp test. Why? Because rdtsc is a long instruction (30-100 cycles) and I''ll bet writing to the TSC is even longer. Plus, there''s a cache bounce overhead added in due to the synchronization via the in-memory tsc_count variable. Our firmware guys say that TSC synchronization can''t be implemented algorithmically in firmware... it requires a simultaneous "reset" signal to all sockets/cores, which is obviously not an option here. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >Yes, thanks. > >BTW, are there any easy way to check the TSC skew in the system? > Originally I get > >the TSC through ITP, that''s not so convenient. > > Just found the "t" debug key. Will try it.Hi Yunhong -- Actually you want to test with the "s" debug key because it runs tsc_check_reliability() which runs Ingo Molnar''s Linux warp test. The "t" debug key is more useful to see if Xen system time has adequately converged on a non-TSC-invariant system (or after C3 halts). Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Thursday, May 27, 2010 11:08 PM >To: Jiang, Yunhong; Keir Fraser; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >> >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> > >> >I implemented this as xen-unstable:21468. This represents a strict >> >improvement on what was in xen-unstable before that (no tsc sync at >> all, >> >ever, because I deleted it about a week ago). Open to further >> improvements, >> >if we can get consensus. >> >> Thanks for the patch. I will get the system that support CPU hotplug >> tomorrow morning, I can try it then. >> >> --jyh >Below is test result: a) With the patch Before hotadd: (XEN) TSC marked as reliable, warp = 0 (count=3) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=4) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=5) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=6) (XEN) No domains have emulated TSC After add (XEN) TSC marked as reliable, warp = 1669912421214 (count=15) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 1669912421214 (count=16) (XEN) No domains have emulated TSC b) With the patch: Before adding: (XEN) TSC marked as reliable, warp = 0 (count=2) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=3) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=4) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=5) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=6) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=7) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 0 (count=8) (XEN) No domains have emulated TSC After add: (XEN) TSC marked as reliable, warp = 407 (count=12) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 407 (count=13) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 407 (count=14) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 407 (count=15) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 407 (count=16) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 444 (count=17) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 444 (count=18) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 525 (count=19) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 525 (count=20) (XEN) No domains have emulated TSC (XEN) TSC marked as reliable, warp = 525 (count=21) (XEN) No domains have emulated TSC>Hmmm... I''d be very surprised if this works ever, let alone >always across all hardware. By "work", I mean that it will >result in an "undetectably small difference" (less than a >cache bounce), as defined and measured by the tsc warp test. >Why? Because rdtsc is a long instruction (30-100 cycles) >and I''ll bet writing to the TSC is even longer. Plus, >there''s a cache bounce overhead added in due to the >synchronization via the in-memory tsc_count variable.I think that depends how we define " undetectably". If the time that guest migration among physical CPU is much higher than this difference, do we really need care about it (of course SMI/NMI is another story)? Or I missed anything? --jyh> >Our firmware guys say that TSC synchronization can''t be >implemented algorithmically in firmware... it requires >a simultaneous "reset" signal to all sockets/cores, which >is obviously not an option here._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/05/2010 06:39, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Hmmm... I''d be very surprised if this works ever, let alone >> always across all hardware. By "work", I mean that it will >> result in an "undetectably small difference" (less than a >> cache bounce), as defined and measured by the tsc warp test. >> Why? Because rdtsc is a long instruction (30-100 cycles) >> and I''ll bet writing to the TSC is even longer. Plus, >> there''s a cache bounce overhead added in due to the >> synchronization via the in-memory tsc_count variable. > > I think that depends how we define " undetectably". If the time that guest > migration among physical CPU is much higher than this difference, do we really > need care about it (of course SMI/NMI is another story)? > Or I missed anything?"Undetectable" by Dan''s definition means undetectable by a multi-threaded app on a multi-vcpu guest. Any detected warp would therefore be a problem. It is impossible to meet that level of TSC consistency when doing CPU physical-add, without emulating all guest TSCs. We may need to add that as an option, at least, to keep a small class of apps that care (like Oracle''s DB, we assume) happy. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Friday, May 28, 2010 1:48 PM >To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen-devel@lists.xensource.com); >Ian Pratt >Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >On 28/05/2010 06:39, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Hmmm... I''d be very surprised if this works ever, let alone >>> always across all hardware. By "work", I mean that it will >>> result in an "undetectably small difference" (less than a >>> cache bounce), as defined and measured by the tsc warp test. >>> Why? Because rdtsc is a long instruction (30-100 cycles) >>> and I''ll bet writing to the TSC is even longer. Plus, >>> there''s a cache bounce overhead added in due to the >>> synchronization via the in-memory tsc_count variable. >> >> I think that depends how we define " undetectably". If the time that guest >> migration among physical CPU is much higher than this difference, do we really >> need care about it (of course SMI/NMI is another story)? >> Or I missed anything? > >"Undetectable" by Dan''s definition means undetectable by a multi-threaded >app on a multi-vcpu guest. Any detected warp would therefore be a problem.Thanks for explain. I didn''t realize this requirement.>It is impossible to meet that level of TSC consistency when doing CPU >physical-add, without emulating all guest TSCs. We may need to add that as >an option, at least, to keep a small class of apps that care (like Oracle''s >DB, we assume) happy.So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?. When CPU_hotadd, we should at least warning if that option is not set, am I right? --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/05/2010 07:29, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> It is impossible to meet that level of TSC consistency when doing CPU >> physical-add, without emulating all guest TSCs. We may need to add that as >> an option, at least, to keep a small class of apps that care (like Oracle''s >> DB, we assume) happy. > > So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?. > When CPU_hotadd, we should at least warning if that option is not set, am I > right?Xen-unstable:21469. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, May 28, 2010 1:04 AM > To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen- > devel@lists.xensource.com); Ian Pratt > Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > > On 28/05/2010 07:29, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > > >> It is impossible to meet that level of TSC consistency when doing > CPU > >> physical-add, without emulating all guest TSCs. We may need to add > that as > >> an option, at least, to keep a small class of apps that care (like > Oracle''s > >> DB, we assume) happy. > > > > So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?. > > When CPU_hotadd, we should at least warning if that option is not > set, am I > > right? > > Xen-unstable:21469.Well, although it''s better than nothing, it seems pretty lame to only put an advisory warning in xen''s log about a condition that may possibly affect many guest OS''s and applications with hard to identify symptoms/failures, and possibly randomly at some point in time that may be days/weeks/months after the event occurs. Consider a cloud service provider for example. The advantage of turning hot-add-cpu off by default is that, if it is turned on at boot-time, TSC emulation can always be enabled for all guests at guest boot and the condition never arises. Are there any other questionable conditions that might arise from hot-adding physical CPUs? For example (my favorite), are any order>0 allocations required? Or what if the hot-added cpu results in mixed generations (e.g. a Nehalem is added to an all-Westmere system, where the apps are using AES instructions)? Anything else? In other words, maybe it would be nice to be able to rule out other special dynamic checks for hot-add cpus that aren''t done for simultaneously-reset cpus? Requiring a boot option to allow hot-add physical CPUs might make a future nasty support problem a lot easier.> "Undetectable" by Dan''s definition means undetectable by > a multi-threaded app on a multi-vcpu guest. Any detected > warp would therefore be a problem.This is actually Linux''s definition, a requirement for selecting tsc as Linux''s default clocksource, and measured by the same algorithm in Xen and Linux. Linux is a bit more flexible than apps in that, if Linux detects a problem, it can fallback from using tsc as the clocksource to some other clocksource. But it remains to be seen how well this will work in a virtual environment, where there are a number of conditions that a bare-metal OS can detect that a virtualized guest OS (or an app running on a physical or virtualized OS) cannot. But to summarize, IMHO, correctness comes first, performance second, and functionality that might be needed on only a small fraction of systems comes third. I think enterprise customers dependent on Xen would agree. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> b) With the patch: > After add: > (XEN) TSC marked as reliable, warp = 407 (count=12) > (XEN) TSC marked as reliable, warp = 444 (count=17) > (XEN) TSC marked as reliable, warp = 525 (count=19)Hi Yunhong -- Does this continue to grow? I''m concerned that the hot-added CPU might be skewing as well? I didn''t think this was possible with an Invariant TSC machine but maybe something in the hot-add isolation electronics changes the characteristics of the clock signal? Please try: for i in {0..1000}; do xm debug-key t; sleep 3; done; \ xm dmesg | tail then wait an hour, and see how large the warp is. Hopefully the trend (407,444,525) is a coincidence. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Friday, May 28, 2010 10:52 PM >To: Jiang, Yunhong; Keir Fraser; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >> b) With the patch: >> After add: >> (XEN) TSC marked as reliable, warp = 407 (count=12) >> (XEN) TSC marked as reliable, warp = 444 (count=17) >> (XEN) TSC marked as reliable, warp = 525 (count=19) > >Hi Yunhong -- > >Does this continue to grow? I''m concerned that >the hot-added CPU might be skewing as well? >I didn''t think this was possible with an Invariant >TSC machine but maybe something in the hot-add >isolation electronics changes the characteristics >of the clock signal?I just looped 10 times, I will try with 1000 loops next Monday. I suspect there are any isolation electronics for hot-add. Basically each CPU can be hot-added except socket 0. But yes, I can have a look on it. --jyh> >Please try: > >for i in {0..1000}; do xm debug-key t; sleep 3; done; \ >xm dmesg | tail > >then wait an hour, and see how large the warp is. >Hopefully the trend (407,444,525) is a coincidence. > >Thanks, >Dan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Friday, May 28, 2010 10:35 PM >To: Keir Fraser; Jiang, Yunhong; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Friday, May 28, 2010 1:04 AM >> To: Jiang, Yunhong; Dan Magenheimer; Xen-Devel (xen- >> devel@lists.xensource.com); Ian Pratt >> Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC >> >> On 28/05/2010 07:29, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >> >> It is impossible to meet that level of TSC consistency when doing >> CPU >> >> physical-add, without emulating all guest TSCs. We may need to add >> that as >> >> an option, at least, to keep a small class of apps that care (like >> Oracle''s >> >> DB, we assume) happy. >> > >> > So a option to make TSC_MODE_DEFAULT as d->arch.vtsc=0 ?. >> > When CPU_hotadd, we should at least warning if that option is not >> set, am I >> > right? >> >> Xen-unstable:21469. > >Well, although it''s better than nothing, it seems pretty >lame to only put an advisory warning in xen''s log about a >condition that may possibly affect many guest OS''s and >applications with hard to identify symptoms/failures, and >possibly randomly at some point in time that may be >days/weeks/months after the event occurs. Consider a cloud >service provider for example. > >The advantage of turning hot-add-cpu off by default >is that, if it is turned on at boot-time, TSC emulation >can always be enabled for all guests at guest boot >and the condition never arises.Hi, Don, considering that hot-add-cpu is not a high-frequent scenerio, IMO, it may happens only under some special situation that can''t be decided in advance. That is, the user has a system with CPU hot-add capability, but is not sure when/whether the CPU hot-add will really happen. it means: 1) If enabling this feature will always cause TSC emulation, it may not worth of it considering the low probability 2) If disable hot-add-cpu by default, user has to reboot the system to enable this feature, it means hot-add CPU is meaningless at all. if user need reboot the system, they don''t need hot-plug at all, they just power-off the system and add it :) One key point is, currently the CPU hot-add will not happen automatically. The step of CPU hot-add is: a) A CPU is hot-added to the system, and OS kernel will be notified by ACPI driver b) OS kernel will create the sysfs file for this new CPU under /sys/, but mark this CPU as offline, since this cpu is not added to Xen, in fact, Xen have no idea of this CPU at all. c) a uevent will be sent to user space of the new added device d) uevent script need to "echo 1 > /sys/device/system/xen_pcpu/xen_pcpuXXX/online", this store operation will trigger a hypercall ,and the CPU will be brought up in the end. So my suggestion is, between step c/d, user space script can do more work before really bringup the CPU. For example, it can check if any special guest/application eixsting requiring strict TSC sequence, if xen has tsc_skew optoin passed when booting. Or worstly, it can simply does not notify Xen for CPU brought-up at all. I think this is more flexible, and is also reasonable. And this can be done by OSV release (like OVM ) easily.> >Are there any other questionable conditions that might >arise from hot-adding physical CPUs? For example (my >favorite), are any order>0 allocations required? OrI don''t remember >0 allocation,, will check it when back to office.>what if the hot-added cpu results in mixed generations >(e.g. a Nehalem is added to an all-Westmere system, >where the apps are using AES instructions)? Anything >else?What will happen if system is booting with mixed generation? For example, when AES is not supported found at AP, will BSP disable the AES?> >In other words, maybe it would be nice to be able >to rule out other special dynamic checks for hot-add >cpus that aren''t done for simultaneously-reset cpus? >Requiring a boot option to allow hot-add physical CPUs >might make a future nasty support problem a lot easier.I think a good uevent script will resolve the issue. .> >> "Undetectable" by Dan''s definition means undetectable by >> a multi-threaded app on a multi-vcpu guest. Any detected >> warp would therefore be a problem. > >This is actually Linux''s definition, a requirement >for selecting tsc as Linux''s default clocksource, >and measured by the same algorithm in Xen and Linux. > >Linux is a bit more flexible than apps in that, if >Linux detects a problem, it can fallback from using >tsc as the clocksource to some other clocksource. >But it remains to be seen how well this will work >in a virtual environment, where there are a number >of conditions that a bare-metal OS can detect >that a virtualized guest OS (or an app running >on a physical or virtualized OS) cannot. > >But to summarize, IMHO, correctness comes first, >performance second, and functionality that might >be needed on only a small fraction of systems >comes third. I think enterprise customers dependent >on Xen would agree.Agree that correctness is most important, what I suggested is, let dom0/adminstrator tools to guard the correctness, not hypervisor, to keop the flexibility. Any idea. Thanks --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/05/2010 15:35, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> "Undetectable" by Dan''s definition means undetectable by >> a multi-threaded app on a multi-vcpu guest. Any detected >> warp would therefore be a problem. > > This is actually Linux''s definition, a requirement > for selecting tsc as Linux''s default clocksource, > and measured by the same algorithm in Xen and Linux.Well, to be precise, it''s Linux''s definition for whether TSC is a suitable basis for the kernel''s monotonic clock source. Linux doesn''t make strong guarantees to applications about TSC semantics, by synthesising TSC, or anything like that. Applying the same constraints on the TSC all the way up to application level was your own proposal. Anyhow, retreading this argument is not going to be fruitful. It''s fair to say that your definition is now also Xen''s definition. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> 2) If disable hot-add-cpu by default, user has to reboot the system to > enable this feature, it means hot-add CPU is meaningless at all. if > user need reboot the system, they don''t need hot-plug at all, they just > power-off the system and add it :)This may be true of a silly one-system administrator, but large data centers that have systems with hot-add capability also have documented policies and procedures that their server administrators must obey. Once a large data center makes this mistake once, they will include it in their policies so that it doesn''t happen again.> c) a uevent will be sent to user space of the new added device > d) uevent script need to "echo 1 > > /sys/device/system/xen_pcpu/xen_pcpuXXX/online", this store operation > will trigger a hypercall ,and the CPU will be brought up in the end. > > So my suggestion is, between step c/d, user space script can do more > work before really bringup the CPU. For example, it can check if any > special guest/application eixsting requiring strict TSC sequence, if > xen has tsc_skew optoin passed when booting. Or worstly, it can simply > does not notify Xen for CPU brought-up at all. I think this is more > flexible, and is also reasonable. And this can be done by OSV release > (like OVM ) easily.This is an interesting approach. But I don''t think dom0 has the knowledge about what assumptions guests might make. Some of the information might be possible to obtain from Xen if we add new dom0<->Xen interfaces. But other decisions are made entirely inside of the guest OS or app and are not exposed to dom0. For example, guest A boots, checks for Invariant TSC, finds that it is set, and selects tsc as a clocksource; while guest B never checks Invariant TSC (even though it is set) and never even uses TSC. I don''t think dom0 or Xen can differentiate the two. And failing to notify Xen because the udev script (or system admininstrator) isn''t sure about the answer is the same as requiring a reboot to specify a boot option.> >Are there any other questionable conditions that might > >arise from hot-adding physical CPUs? For example (my > >favorite), are any order>0 allocations required? Or > > I don''t remember >0 allocation,, will check it when back to office. > > >what if the hot-added cpu results in mixed generations > >(e.g. a Nehalem is added to an all-Westmere system, > >where the apps are using AES instructions)? Anything > >else? > > What will happen if system is booting with mixed generation? For > example, when AES is not supported found at AP, will BSP disable the > AES?These were just possible examples. I think there are probably other examples which may cause problems.> Agree that correctness is most important, what I suggested is, let > dom0/adminstrator tools to guard the correctness, not hypervisor, to > keop the flexibility. Any idea.And I agree with you that anything that can be done in tools should be done in tools instead of the hypervisor. But requiring a physical system administrator to know everything about every feature in the underlying physical system, every feature in any guest OS, and every app that may or may not run on any VM now or in the future -- and then requiring that admin to make decisions -- does not IMHO do much to guard correctness. Requiring a boot option for hot-add guarantees correctness (at the cost of performance only when the boot option is specified) and is very simple to implement; that''s why I am in favor of it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Linux doesn''t make strong > guarantees to applications about TSC semantics, by synthesising TSC, or > anything like that.Working on that ;-) IMHO, any non-privileged use of rdtsc, even under the control of the kernel (e.g. vsyscall) has potential issues in a virtual machine. Except a VMware VM, where the problem is completely solved. But, as you know, many Linux kernel developers aren''t too interested in solving problems for Xen, so there''s an uphill battle :-(> Anyhow, retreading this argument is not going to be fruitful > It''s fair to > say that your definition is now also Xen''s definition.Sorry, I''m not trying to beat a dead horse. I''m just playing whack-a-mole and repeating arguments for a new audience and a new corner case. If you prefer, I can take that offlist. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/05/2010 17:36, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> Anyhow, retreading this argument is not going to be fruitful >> It''s fair to >> say that your definition is now also Xen''s definition. > > Sorry, I''m not trying to beat a dead horse. I''m just > playing whack-a-mole and repeating arguments for a > new audience and a new corner case. If you prefer, I > can take that offlist.No, that''s fine. And in general, supporting both the consistent-TSC and fast-TSC cases (the latter including perhaps a more-Xen-features subcase as well) as configurable options is good. Those who care will know how they want to set the options. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >> b) With the patch: > >> After add: > >> (XEN) TSC marked as reliable, warp = 407 (count=12) > >> (XEN) TSC marked as reliable, warp = 444 (count=17) > >> (XEN) TSC marked as reliable, warp = 525 (count=19) > > > >Hi Yunhong -- > > > >Does this continue to grow? I''m concerned that > >the hot-added CPU might be skewing as well? > >I didn''t think this was possible with an Invariant > >TSC machine but maybe something in the hot-add > >isolation electronics changes the characteristics > >of the clock signal? > >Please try: > > > >for i in {0..1000}; do xm debug-key t; sleep 3; done; \ > >xm dmesg | tail > > > >then wait an hour, and see how large the warp is. > >Hopefully the trend (407,444,525) is a coincidence. > > I just looped 10 times, I will try with 1000 loops next Monday. > I suspect there are any isolation electronics for hot-add. Basically > each CPU can be hot-added except socket 0. But yes, I can have a look > on it.Hmmm... I''m not a system hardware expert, but the more I think about this, the more likely it seems that any hot-plug board must have separate QPI buses that are driven by separate crystals. And there would be some kind of bus bridge/repeater to connect the two with a forwarding protocol. That would certainly explain a growing TSC skew. If so, even two single socket boards connected like that at initial boot (no hot-add) are really a "big NUMA" system due to higher cross-node latencies and might deserve a separate boot option anyway... this is really NOT a single system... it is multiple systems glued together with a fast interconnect. Xen (and users) should be warned that there is no free lunch here and the performance degradation from TSC emulation may be only a small part of the problem. Some boot option like "multiboard_interconnect" (but shorter) might be appropriate? Or is there some way at boot-time to determine that this box does, or might (via hot-add), or definitely does not, go beyond point-to-point interconnect? A boot-time decision on TSC emulation could be driven off of that if it existed. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Saturday, May 29, 2010 4:25 AM >To: Jiang, Yunhong; Keir Fraser; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >> >> b) With the patch: >> >> After add: >> >> (XEN) TSC marked as reliable, warp = 407 (count=12) >> >> (XEN) TSC marked as reliable, warp = 444 (count=17) >> >> (XEN) TSC marked as reliable, warp = 525 (count=19) >> > >> >Hi Yunhong -- >> > >> >Does this continue to grow? I''m concerned that >> >the hot-added CPU might be skewing as well? >> >I didn''t think this was possible with an Invariant >> >TSC machine but maybe something in the hot-add >> >isolation electronics changes the characteristics >> >of the clock signal? >> >Please try: >> > >> >for i in {0..1000}; do xm debug-key t; sleep 3; done; \ >> >xm dmesg | tail >> > >> >then wait an hour, and see how large the warp is. >> >Hopefully the trend (407,444,525) is a coincidence. >> >> I just looped 10 times, I will try with 1000 loops next Monday. >> I suspect there are any isolation electronics for hot-add. Basically >> each CPU can be hot-added except socket 0. But yes, I can have a look >> on it. > >Hmmm... I''m not a system hardware expert, but the more I think >about this, the more likely it seems that any hot-plug >board must have separate QPI buses that are driven by separate >crystals. And there would be some kind of bus bridge/repeater >to connect the two with a forwarding protocol. That would >certainly explain a growing TSC skew.Hmm, I''m not either. I have no idea of the hot-plug board situation, our current system is in fact NOT physically hot-add. Instead, a hardware switch will turn the CPU on/off and trigger the CPU hotplug. And at least on this platform, there is only one crystals. But I suspect if hotplug CPU board really need seperated crystals, any special reason? Of course, it totally depends on system/board design. The test result is followed. (XEN) TSC marked as reliable, warp = 203 (count=155) ---->Before the insert (XEN) TSC marked as reliable, warp = 637 (count=156) --> After hotadd (XEN) TSC marked as reliable, warp = 644 (count=165) (XEN) TSC marked as reliable, warp = 652 (count=311) (XEN) TSC marked as reliable, warp = 652 (count=609) (XEN) TSC marked as reliable, warp = 655 (count=1206) So some increase in the early stage, and then it''s stable from count 609 to count 1206. BTW, I notice one more thing, when system booting w/o hotplug, the warp is 0. However, after I return back after weekend, I noticed the warp is 182. Because I did the hotplug action before getting the warp, I''m not sure if it''s caused by the hotplug action, or the system TSC will drift very slowly. (XEN) TSC marked as reliable, warp = 182 (count=2)> >If so, even two single socket boards connected like that at >initial boot (no hot-add) are really a "big NUMA" system due >to higher cross-node latencies and might deserve a separate >boot option anyway... this is really NOT a single system... >it is multiple systems glued together with a fast interconnect. >Xen (and users) should be warned that there is no free >lunch here and the performance degradation from TSC emulation >may be only a small part of the problem. > >Some boot option like "multiboard_interconnect" (but shorter) >might be appropriate? Or is there some way at boot-time >to determine that this box does, or might (via hot-add), or >definitely does not, go beyond point-to-point interconnect? >A boot-time decision on TSC emulation could be driven off >of that if it existed.If there is no hot-plug happen, this should be detectable already when booting, so no difference. When hot-plug do happen, it should makes no difference, unless we can provide a better software algrithm, which can solve one-crystal situation, but can''t resolve this one. --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> >Are there any other questionable conditions that might >> >arise from hot-adding physical CPUs? For example (my >> >favorite), are any order>0 allocations required? Or >> >> I don''t remember >0 allocation,, will check it when back to office.Hmm, some >0 order do exists, like per_cpu stack, or per_cpu area. Allocation failure should only cause hotplug failure, and should have no side-effect.>> >> >what if the hot-added cpu results in mixed generations >> >(e.g. a Nehalem is added to an all-Westmere system, >> >where the apps are using AES instructions)? Anything >> >else? >> >> What will happen if system is booting with mixed generation? For >> example, when AES is not supported found at AP, will BSP disable the >> AES? > >These were just possible examples. I think there are probably >other examples which may cause problems.We need category these potential issues. The AES is the feature difference, and I assume it is same as booting stage.> >> Agree that correctness is most important, what I suggested is, let >> dom0/adminstrator tools to guard the correctness, not hypervisor, to >> keop the flexibility. Any idea. > >And I agree with you that anything that can be done in tools >should be done in tools instead of the hypervisor. But requiring >a physical system administrator to know everything about every >feature in the underlying physical system, every feature in >any guest OS, and every app that may or may not run on any VM >now or in the future -- and then requiring that admin to make >decisions -- does not IMHO do much to guard correctness.But what''s the difference of tools option and xen option? IMO, even if we want to disable CPU hotadd by default, do it in user space tools in release is much better than in xen hypervisor. After all, no matter which method used, the requirement to physical system admnistrator is same. And the flexibility of tools brings up several possibility to workaround the issue, like utilize cpupool, to limit existing guest to booting-CPUs, while new guest to new-added CPUs; or LM exists guest OS and back, to turn fast TSC to consistent TSC. As you said, large data center should documented policies and procedures for system administrator. However, if we disable this through xen option, this hot-add capability can''t be restored anymore without rebooting. BTW, Keir, instead of sync the BSP''s TSC to new CPU, we can simply keep a TSC offset between the two CPUs, this way, we eliminate a write TSC instruction and the result should similar to your tsc test code.> >Requiring a boot option for hot-add guarantees correctness >(at the cost of performance only when the boot option is >specified) and is very simple to implement; that''s why I >am in favor of it.Thanks --jyh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> BTW, I notice one more thing, when system booting w/o hotplug, the warp > is 0. However, after I return back after weekend, I noticed the warp is > 182. Because I did the hotplug action before getting the warp, I''m not > sure if it''s caused by the hotplug action, or the system TSC will drift > very slowly. > (XEN) TSC marked as reliable, warp = 182 (count=2)Hmmm... I''m much more worried about this case and would like to understand this better. If this is reproducible on real-world QPI systems, and there is no way to a priori determine that "this is a system where even though the Invariant TSC bit is set, this system may drift", then there is no way Invariant TSC can be exposed to a guest. /me can hear Jeremy biting his tongue hard to avoid saying "I told you so". ;-) Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, I''m confused by this also. It will takes to this weekend so that I can try it again. --jyh>-----Original Message----- >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Tuesday, June 01, 2010 8:31 AM >To: Jiang, Yunhong; Keir Fraser; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: RE: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >> BTW, I notice one more thing, when system booting w/o hotplug, the warp >> is 0. However, after I return back after weekend, I noticed the warp is >> 182. Because I did the hotplug action before getting the warp, I''m not >> sure if it''s caused by the hotplug action, or the system TSC will drift >> very slowly. >> (XEN) TSC marked as reliable, warp = 182 (count=2) > >Hmmm... I''m much more worried about this case and would >like to understand this better. If this is reproducible >on real-world QPI systems, and there is no way to a priori >determine that "this is a system where even though the >Invariant TSC bit is set, this system may drift", then >there is no way Invariant TSC can be exposed to a guest. > >/me can hear Jeremy biting his tongue hard to avoid >saying "I told you so". ;-) > >Dan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jun-01 17:07 UTC
Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC
On 05/31/2010 05:30 PM, Dan Magenheimer wrote:>> BTW, I notice one more thing, when system booting w/o hotplug, the warp >> is 0. However, after I return back after weekend, I noticed the warp is >> 182. Because I did the hotplug action before getting the warp, I''m not >> sure if it''s caused by the hotplug action, or the system TSC will drift >> very slowly. >> (XEN) TSC marked as reliable, warp = 182 (count=2) >> > Hmmm... I''m much more worried about this case and would > like to understand this better. If this is reproducible > on real-world QPI systems, and there is no way to a priori > determine that "this is a system where even though the > Invariant TSC bit is set, this system may drift", then > there is no way Invariant TSC can be exposed to a guest. >Some crappy BIOSes will attempt to hide the time taken by a SMI by save/restoring tsc over the call. Could something like that be happening here? One of the nicest upcoming tsc-related architectural changes is that the cpus will expose both the underlying base tsc counter, and the offset used to compute rdtsc; a wrtsc will just end up adjusting that offset without affecting the underlying counter, making it easy to tell when people are trying to play games with the tsc (and also making the process of adjusting the tsc one of determining the offset, independent of trying to place games with updating a racing tsc).> /me can hear Jeremy biting his tongue hard to avoid > saying "I told you so". ;-) >... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>-----Original Message----- >From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >Sent: Wednesday, June 02, 2010 1:07 AM >To: Dan Magenheimer >Cc: Jiang, Yunhong; Keir Fraser; Xen-Devel (xen-devel@lists.xensource.com); Ian >Pratt >Subject: Re: [Xen-devel] [RFC] Physical hot-add cpus and TSC > >On 05/31/2010 05:30 PM, Dan Magenheimer wrote: >>> BTW, I notice one more thing, when system booting w/o hotplug, the warp >>> is 0. However, after I return back after weekend, I noticed the warp is >>> 182. Because I did the hotplug action before getting the warp, I''m not >>> sure if it''s caused by the hotplug action, or the system TSC will drift >>> very slowly. >>> (XEN) TSC marked as reliable, warp = 182 (count=2) >>> >> Hmmm... I''m much more worried about this case and would >> like to understand this better. If this is reproducible >> on real-world QPI systems, and there is no way to a priori >> determine that "this is a system where even though the >> Invariant TSC bit is set, this system may drift", then >> there is no way Invariant TSC can be exposed to a guest. >> > >Some crappy BIOSes will attempt to hide the time taken by a SMI by >save/restoring tsc over the call. Could something like that be >happening here? > >One of the nicest upcoming tsc-related architectural changes is that the >cpus will expose both the underlying base tsc counter, and the offset >used to compute rdtsc; a wrtsc will just end up adjusting that offset >without affecting the underlying counter, making it easy to tell when >people are trying to play games with the tsc (and also making the >process of adjusting the tsc one of determining the offset, independent >of trying to place games with updating a racing tsc). >Because that data is collected after a weekend, so I''m not sure if anything happen to the system (for example, someone may hot-add a CPU but I''m unware of it and didnt check it). I will retry it this weekend. At least after running for half day this morning, I didn''t find such issue again. Per my understanding, the TSC should be stable, a lot of effort has been made so that TSC is reliable in the system. --jyh>> /me can hear Jeremy biting his tongue hard to avoid >> saying "I told you so". ;-) >> > >... > > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Below is test result: > a) With the patch > Before hotadd: > (XEN) TSC marked as reliable, warp = 0 (count=3) > (XEN) No domains have emulated TSC > (XEN) TSC marked as reliable, warp = 0 (count=4) > (XEN) No domains have emulated TSC > (XEN) TSC marked as reliable, warp = 0 (count=5) > (XEN) No domains have emulated TSC > (XEN) TSC marked as reliable, warp = 0 (count=6) > (XEN) No domains have emulated TSC > > After add > (XEN) TSC marked as reliable, warp = 1669912421214 (count=15) > (XEN) No domains have emulated TSC > (XEN) TSC marked as reliable, warp = 1669912421214 (count=16) > (XEN) No domains have emulated TSC >If the warp is fixed, at least for HVM, this can be solved by adjusting the TSC_OFFSET with the additional warp to make guest see warp=0 for TSC invariant case. Anything missed? Thx, Eddie _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Dong, Eddie [mailto:eddie.dong@intel.com] > > > Below is test result: > > a) With the patch > > Before hotadd: > > (XEN) TSC marked as reliable, warp = 0 (count=3) > > (XEN) No domains have emulated TSC > > (XEN) TSC marked as reliable, warp = 0 (count=4) > > (XEN) No domains have emulated TSC > > (XEN) TSC marked as reliable, warp = 0 (count=5) > > (XEN) No domains have emulated TSC > > (XEN) TSC marked as reliable, warp = 0 (count=6) > > (XEN) No domains have emulated TSC > > > > After add > > (XEN) TSC marked as reliable, warp = 1669912421214 (count=15) > > (XEN) No domains have emulated TSC > > (XEN) TSC marked as reliable, warp = 1669912421214 (count=16) > > (XEN) No domains have emulated TSC > > If the warp is fixed, at least for HVM, this can be solved by adjusting > the TSC_OFFSET with the additional warp to make guest see warp=0 for > TSC invariant case. Anything missed?Hi Eddie -- Two things: 1) The TSC_OFFSET doesn''t work for PV domains. 2) For HVM, it is very difficult to choose a precise TSC_OFFSET so that it passes a warp test. If it doesn''t pass a warp test, upstream kernels will stop using tsc as a clocksource resulting in a big performance loss... and some applications that use TSC and are not TSC resilient that may have been working fine for weeks may suddenly break due to an event (adding a physical CPU to Xen) that neither the app nor its (guest) OS is able to detect. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel