Hi Ian/Stefano: I changed over to the PV clock for hybrid liked we talked at the hackathon. I still have the hang in update_wall_time() after dom0 switches to xen as clocksource. The source of hang seems to be in xen stime_local_stamp in cpu_time that suddenly jumps to a large 64bit value. I''ve been chasing to figure where that happens, and why for the hybrid and not PV. It appears the source of jump is time_calibration_std_rendezvous() in c->stime_master_stamp. Chasing that, seems to come from read_platform_stime(). The jump happens after switch to xen clocksource in dom0. I''ll continue to debug, but let me know if you have any thoughts or idea what might be going on. thanks, Mukesh
>>> On 20.03.12 at 02:30, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > Hi Ian/Stefano: > > I changed over to the PV clock for hybrid liked we talked at the > hackathon. I still have the hang in update_wall_time() after dom0 > switches to xen as clocksource. > > The source of hang seems to be in xen stime_local_stamp in cpu_time that > suddenly jumps to a large 64bit value. I''ve been chasing to figure > where that happens, and why for the hybrid and not PV. It appears the > source of jump is time_calibration_std_rendezvous() in > c->stime_master_stamp. Chasing that, seems to come from > read_platform_stime(). The jump happens after switch to > xen clocksource in dom0. > > I''ll continue to debug, but let me know if you have any thoughts or > idea what might be going on.Your Dom0 hasn''t possibly played with the HPET, and Xen at the same time is using the HPET as clock source? (Preventing this is rather difficult, as the HPET memory space - iirc - is just 1k, so excluding Dom0 access to the full page isn''t easily possible. Consequently, Xen so far has been relying on Dom0 to not get in the way.) Jan
On Tue, 20 Mar 2012, Mukesh Rathor wrote:> Hi Ian/Stefano: > > I changed over to the PV clock for hybrid liked we talked at the > hackathon. I still have the hang in update_wall_time() after dom0 > switches to xen as clocksource. > > The source of hang seems to be in xen stime_local_stamp in cpu_time that > suddenly jumps to a large 64bit value. I''ve been chasing to figure > where that happens, and why for the hybrid and not PV. It appears the > source of jump is time_calibration_std_rendezvous() in > c->stime_master_stamp. Chasing that, seems to come from > read_platform_stime(). The jump happens after switch to > xen clocksource in dom0. > > I''ll continue to debug, but let me know if you have any thoughts or > idea what might be going on.stime_local_stamp is set to get_s_time() and get_s_time scales the tsc value according to local_tsc_stamp and tsc_scale. You need to make sure that these two parameters are correct for dom0 hybrid as well. Also I would keep an eye on arch.hvm_vcpu.stime_offset and arch.hvm_vcpu.cache_tsc_offset that only play a role in hvm domains. Maybe they are not set to correct values in your case? Give a look at hvm_set_guest_tsc and hvm_set_guest_time. You probably need to initialize them on hybrid as well and make sure you take the is_hvm_domain path in __update_vcpu_system_time.
On Tue, 20 Mar 2012 12:39:24 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> stime_local_stamp is set to get_s_time() and get_s_time scales the tsc > value according to local_tsc_stamp and tsc_scale. You need to make > sure that these two parameters are correct for dom0 hybrid as well. > > Also I would keep an eye on arch.hvm_vcpu.stime_offset and > arch.hvm_vcpu.cache_tsc_offset that only play a role in hvm domains. > Maybe they are not set to correct values in your case? > > Give a look at hvm_set_guest_tsc and hvm_set_guest_time. You probably > need to initialize them on hybrid as well and make sure you take the > is_hvm_domain path in __update_vcpu_system_time.Hmm... I thought we decided we didn''t want any HVM time paths for hybrid, but PV only. A bit confused now. I''ll take a look at HVM time variables nevertheless. thanks, Mukesh
On Tue, 20 Mar 2012, Mukesh Rathor wrote:> On Tue, 20 Mar 2012 12:39:24 +0000 > Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > > stime_local_stamp is set to get_s_time() and get_s_time scales the tsc > > value according to local_tsc_stamp and tsc_scale. You need to make > > sure that these two parameters are correct for dom0 hybrid as well. > > > > Also I would keep an eye on arch.hvm_vcpu.stime_offset and > > arch.hvm_vcpu.cache_tsc_offset that only play a role in hvm domains. > > Maybe they are not set to correct values in your case? > > > > Give a look at hvm_set_guest_tsc and hvm_set_guest_time. You probably > > need to initialize them on hybrid as well and make sure you take the > > is_hvm_domain path in __update_vcpu_system_time. > > Hmm... I thought we decided we didn''t want any HVM time paths for hybrid, > but PV only. A bit confused now. I''ll take a look at HVM time variables > nevertheless.Those variables are necessary to have a correct tsc on HVM and they happen to be initialized in the emulated paths right now. You might have to initialize them yourself in the hybrid case.
On Tue, 20 Mar 2012 09:19:23 +0000 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 20.03.12 at 02:30, Mukesh Rathor <mukesh.rathor@oracle.com>...> Your Dom0 hasn''t possibly played with the HPET, and Xen at the same > time is using the HPET as clock source? (Preventing this is rather > difficult, as the HPET memory space - iirc - is just 1k, so excluding > Dom0 access to the full page isn''t easily possible. Consequently, Xen > so far has been relying on Dom0 to not get in the way.)Pretty close. I debugged it finally, and the problem was the hybrid writing to the HPET while xen was using it. This was happening because ACPI was not properly initialized, and created a ripple effect where hpet_virt_address was set. In PV, it''s not set and so hpet_late_init() bails. Anyways, I fixed it, and now to next one caused by proper initialization of ACPI... :). thanks, Mukesh
Reasonably Related Threads
- [xen-unstable test] 18092: tolerable FAIL
- [PATCH] x86/vtsc: update vcpu_time after hvm_set_guest_time
- [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
- [PATCH 2/3] Implement tsc adjust feature
- [PATCH][Retry 1] 1/4: cpufreq/PowerNow! in Xen: Xen timer changes