Hello, While trying to get PV timers on FreeBSD to work, we found that when the system is under stress sometimes FreeBSD would stop receiving timer interrupts on certain vCPUs. I''ve added some trace points to Xen in order to see what was happening with this timers and got the following trace. I''m also attaching the crappy changes I''ve did to Xen in order to add the trace points, maybe I didn''t put the trace points in the right position and that''s why I''m seeing this: 169.097952305 |..x.... d32768v3 d32768 fire timer vCPU3: 1204944804808 169.097966813 |..x.... d32768v3 d32768 set timer vCPU3: 1204946614221 169.097977454 |..x.... d32768v3 d32768 set timer vCPU3: 1204963746273 169.103410517 |..x.... d32768v3 d32768 set timer vCPU3: 1204952682637 169.103429902 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356797 169.124982369 |..x.... d32768v3 d32768 set timer vCPU3: 1204976302984 169.124988972 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356855 169.128470252 |..x.... d32768v3 d32768 set timer vCPU3: 1204976300693 169.128639545 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356796 169.136240633 |..x.... d32768v3 d32768 set timer vCPU3: 1204984176928 169.136246955 |..x.... d32768v3 d32768 set timer vCPU3: 1205000357030 169.139729248 |..x.... d32768v3 d32768 set timer vCPU3: 1204992115770 169.139892831 |..x.... d32768v3 d32768 set timer vCPU3: 1205032863201 169.151133497 |..x.... d32768v3 d32768 set timer vCPU3: 1204999923870 169.151137341 |..x.... d32768v3 d32768 set timer vCPU3: 1205032863121 169.352765068 |..x.... d32768v3 d32768 set timer vCPU3: 1205204647080 169.352769008 |..x.... d32768v3 d32768 set timer vCPU3: 1205633794418 176.818699597 |..x.... d32768v3 d32768 set timer vCPU3: 1212676613211 176.818702670 |..x.... d32768v3 d32768 set timer vCPU3: 1213055669298 318.489693649 |..x.... d32768v3 d32768 set timer vCPU3: 1354464489030 < 318.489698326 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606923 318.491853105 |..x.... d32768v3 d32768 set timer vCPU3: 1354464489183 318.491857363 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606862 318.494243541 |..x.... d32768v3 d32768 set timer vCPU3: 1354466611397 318.494253173 |..x.... d32768v3 d32768 set timer vCPU3: 1354766607167 318.497005208 |..x.... d32768v3 d32768 set timer vCPU3: 1354472363654 318.497009517 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606858 As it can be seen above, the vCPU was expecting a timer to fire at 1213055669298, but as far as I can tell the timer never reached the guest, neither Xen called execute_timer to fire the timer. Then in order to reset the timer I''ve started some other task that probably sends an IPI to the stuck processor and it tries to set the timer again (this time successfully). Is there any reason that could prevent the timer from being fired? Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 19.04.13 at 20:20, Roger Pau Monné<roger.pau@citrix.com> wrote: > While trying to get PV timers on FreeBSD to work, we found that when the > system is under stress sometimes FreeBSD would stop receiving timer > interrupts on certain vCPUs. I've added some trace points to Xen in > order to see what was happening with this timers and got the following > trace. I'm also attaching the crappy changes I've did to Xen in order to > add the trace points, maybe I didn't put the trace points in the right > position and that's why I'm seeing this:For those of us not familiar with FreeBSD - which interrupt is it using as timer interrupt (IRQ0, LAPIC, HPET, RTC)? Considering that you had problems with the RTC running FreeBSD on unstable, I'm particularly wondering whether there is still an issue left, even more so as I think that the 10-instance grace period for not shutting down the periodic interrupt may be too small when the system is under stress and the period is rather short (I'm thinking that the period should really be dependent on vCPU execution time, not absolute [host] time, as a mostly stalled - due to higher priority load - vCPU may not even have a chance to get its interrupt handler to run). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 22/04/13 09:15, Jan Beulich wrote:>>>> On 19.04.13 at 20:20, Roger Pau Monné<roger.pau@citrix.com> wrote: >> While trying to get PV timers on FreeBSD to work, we found that when the >> system is under stress sometimes FreeBSD would stop receiving timer >> interrupts on certain vCPUs. I've added some trace points to Xen in >> order to see what was happening with this timers and got the following >> trace. I'm also attaching the crappy changes I've did to Xen in order to >> add the trace points, maybe I didn't put the trace points in the right >> position and that's why I'm seeing this: > > For those of us not familiar with FreeBSD - which interrupt is it > using as timer interrupt (IRQ0, LAPIC, HPET, RTC)?It is using the one-shot PV timer (VCPUOP_set_singleshot_timer) and the PVHVM vector injection. I don't think this is related to the previous issue, which was related to the emulated RTC. I've also tried it with different Xen versions, -unstable, 4.2 and 4.1 and the result is the same, so I guess we are missing something regarding the PV timer implementation in FreeBSD.> > Considering that you had problems with the RTC running FreeBSD > on unstable, I'm particularly wondering whether there is still an > issue left, even more so as I think that the 10-instance grace > period for not shutting down the periodic interrupt may be too > small when the system is under stress and the period is rather > short (I'm thinking that the period should really be dependent on > vCPU execution time, not absolute [host] time, as a mostly > stalled - due to higher priority load - vCPU may not even have a > chance to get its interrupt handler to run).FreeBSD sets the timer using: HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer,... This operation returns 0, so the timer is set OK, but then I don't see any calls to execute_timer with xentrace, and if I dump the timers on the console using 'a' I can see that the timer is gone. So I was trying to figure out what could make the timer disappear without calling execute_timer. Also, I should mention that the PV timer seems to work OK most of the time, the issue that I'm describing above only happens when the system is under heavy load. This is an excerpt from xenalyze: 179.449545526 ...x.... d32768v3 d32768 set timer vCPU3: 373780840651 179.457154834 ...x.... d32768v3 d32768 fire timer vCPU3: 373780840651 179.457185647 ...x.... d32768v3 d32768 set timer vCPU3: 373781172033 179.457396161 ...x.... d32768v3 d32768 fire timer vCPU3: 373781172033 179.457423727 ...x.... d32768v3 d32768 set timer vCPU3: 373788955311 179.465247662 ...x.... d32768v3 d32768 fire timer vCPU3: 373788955311 179.465279399 ...x.... d32768v3 d32768 set timer vCPU3: 373790840795 179.467133153 ...x.... d32768v3 d32768 fire timer vCPU3: 373790840795 179.467163876 ...x.... d32768v3 d32768 set timer vCPU3: 373796829048 the timer seems to work reliably until it suddenly stops... Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel