Glauber de Oliveira Costa
2006-Nov-24 13:10 UTC
[Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
After being offline for a long time, the softlockup watchdog triggers a BUG() on our faces. This is expected, as in fact, we spent more than a fixed 10*HZ amount of time without touching the watchdog. However, by inspecting the contents of RUNSTATE_offline, we can gain awareness of the fact, and do better than that. This patch fixes it. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-27 10:21 UTC
Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:> After being offline for a long time, the softlockup watchdog triggers > a BUG() on our faces. This is expected, as in fact, we spent more than > a fixed 10*HZ amount of time without touching the watchdog. > > However, by inspecting the contents of RUNSTATE_offline, we can gain > awareness of the fact, and do better than that. This patch fixes it. > > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>Would ''stolen'' not be a good enough thing to test? Presumably this is really just dealing with xm pause/unpause (a single long offline) so this simpler fix would work just as well? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Nov-27 15:31 UTC
Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
On Mon, Nov 27, 2006 at 10:21:54AM +0000, Keir Fraser wrote:> > > > On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote: > > > After being offline for a long time, the softlockup watchdog triggers > > a BUG() on our faces. This is expected, as in fact, we spent more than > > a fixed 10*HZ amount of time without touching the watchdog. > > > > However, by inspecting the contents of RUNSTATE_offline, we can gain > > awareness of the fact, and do better than that. This patch fixes it. > > > > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> > > Would ''stolen'' not be a good enough thing to test? Presumably this is really > just dealing with xm pause/unpause (a single long offline) so this simpler > fix would work just as well?I thought about it, but I''m not 100 % sure. Reasons I had for not using stolen, were basically: * Conceptually, (maybe not in practice) stolen could grow due to runnable time only. * stolen time, as well as blocked time, does not have it''s corresponding per processor variable updated all in once, but in multiples of NS_PER_TICK chuncks. If we''re out for too long, we could detect stolen being too great multiple times, leading to far more calls to the softlockup watchdog then we want too. Waiting for your comments on this, -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Nov-27 16:47 UTC
Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
> * stolen time, as well as blocked time, does not have it''s corresponding > per processor variable updated all in once, but in multiples of > NS_PER_TICK chuncks. If we''re out for too long, we could detect stolen > being too great multiple times, leading to far more calls to the > softlockup watchdog then we want too.FYI, I just made a simple test checking for stolen time instead of offline, and it''s in fact called way too oftenly. -- Glauber de Oliveira Costa. "Free as in Freedom" Add your comments to GPLv3 at: http://gplv3.fsf.org/comments/gplv3-draft-2.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-27 18:54 UTC
Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
On 27/11/06 4:47 pm, "Glauber de Oliveira Costa" <glommer@gmail.com> wrote:>> * stolen time, as well as blocked time, does not have it''s corresponding >> per processor variable updated all in once, but in multiples of >> NS_PER_TICK chuncks. If we''re out for too long, we could detect stolen >> being too great multiple times, leading to far more calls to the >> softlockup watchdog then we want too. > > FYI, I just made a simple test checking for stolen time instead of > offline, and it''s in fact called way too oftenly.That doesn''t make sense. Processed_stolen_time should lag at most 1 jiffy behind actual stolen time. So you still need to accumulate at least 10*HZ-1 jiffies of stolen time in one go to end up touching the softlockup watchdog. As far as I can see, anyway. What workload did you run to test using stolen time? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Nov-29 11:46 UTC
Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
On Mon, Nov 27, 2006 at 06:54:26PM +0000, Keir Fraser wrote:> > FYI, I just made a simple test checking for stolen time instead of > > offline, and it''s in fact called way too oftenly. > > That doesn''t make sense. Processed_stolen_time should lag at most 1 jiffy > behind actual stolen time. So you still need to accumulate at least 10*HZ-1 > jiffies of stolen time in one go to end up touching the softlockup watchdog. > As far as I can see, anyway. What workload did you run to test using stolen > time?Thanks for pointing it, Keir. After going back to it, I found it to be a small mistake of mine. I was calling the softlockup watchdog before accounting stolen ticks. Calling it after it does the trick. I''ll resend the patch soon. -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Nov-29 12:08 UTC
[Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.
[LINUX] Avoid triggering the softlockup BUG when offline for too long. After being offline for a long time, the softlockup watchdog triggers a BUG() on our faces. This is expected, as in fact, we spent more than a fixed 10*HZ amount of time without touching the watchdog. However, by inspecting the contents of stolen inside timer irq handler, we can gain awareness of the fact, and do better than that. This patch fixes it. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-29 12:18 UTC
[Xen-devel] Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
On 29/11/06 12:08, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:> [LINUX] Avoid triggering the softlockup BUG when offline for too long. > > After being offline for a long time, the softlockup watchdog triggers > a BUG() on our faces. This is expected, as in fact, we spent more than > a fixed 10*HZ amount of time without touching the watchdog. > > However, by inspecting the contents of stolen inside timer irq handler, > we can gain awareness of the fact, and do better than that. > This patch fixes it.Thanks. I changed the threshold to 5*HZ just to avoid marginal cases where we might be offlined for just less than 10 seconds, and then if the per-cpu watchdog process hasn''t run for a second or two before we were offlined then that would push us over the edge to print a warning. 5*HZ is much more comfortable. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel