Graham, Simon
2006-Jul-14 19:00 UTC
[Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
Currently, xentop exits with a failure if xenstat_get_node() returns NULL - this can happen if a VM is deleted between the time xenstat_get_node() gets the list of VMs and the time it attempts to collect information about the VMs. This patch modifies xentop to simply ignore a NULL return and loop around to try the call again. Note that the patch supplied is against 3.0.testing although I think unstable is basically the same. Signed-off-by: Simon Graham Simon.Graham@stratus.com --- Index: xentop.c ==================================================================--- xentop.c (revision 3098) +++ xentop.c (working copy) @@ -762,14 +762,18 @@ { xenstat_domain **domains; unsigned int i, num_domains = 0; + xenstat_node *new_node = xenstat_get_node(xhandle, XENSTAT_ALL); - /* Now get the node information */ + if (new_node == NULL) { + // This can happen if domains change during call - just + // return and try again + return; + } + if (prev_node != NULL) xenstat_free_node(prev_node); prev_node = cur_node; - cur_node = xenstat_get_node(xhandle, XENSTAT_ALL); - if (cur_node == NULL) - fail("Failed to retrieve statistics from libxenstat\n"); + cur_node = new_node; /* dump summary top information */ do_summary(); @@ -871,8 +875,10 @@ if(ch != ERR || (curtime.tv_sec - oldtime.tv_sec) >delay) { clear(); top(); - oldtime = curtime; - refresh(); + if (cur_node != NULL) { + oldtime = curtime; + refresh(); + } } ch = getch(); } while (handle_key(ch)); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-25 15:21 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 14 Jul 2006, at 20:00, Graham, Simon wrote:> Currently, xentop exits with a failure if xenstat_get_node() returns > NULL - this can happen if a VM is deleted between the time > xenstat_get_node() gets the list of VMs and the time it attempts to > collect information about the VMs. This patch modifies xentop to simply > ignore a NULL return and loop around to try the call again. > > Note that the patch supplied is against 3.0.testing although I think > unstable is basically the same. > > Signed-off-by: Simon Graham Simon.Graham@stratus.comlibxenstat should be fixed so that xenstat_get_node() does not spuriously fail in this way. This could be done by getting the ''collectors'' it calls to return better error info -- we can use EAGAIN to cause xenstat_get_node() to rerun itself from scratch rather than returning failure to the caller. Otherwise we''re going to need to fix every user of libxenstat, which is stupid. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Jul-25 18:40 UTC
RE: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
> > libxenstat should be fixed so that xenstat_get_node() does not > spuriously fail in this way. This could be done by getting the > ''collectors'' it calls to return better error info -- we can use EAGAIN > to cause xenstat_get_node() to rerun itself from scratch rather than > returning failure to the caller. Otherwise we''re going to need to fix > every user of libxenstat, which is stupid. > > -- KeirI thought about this too but decided not to for a couple of reasons: 1. It''s not clear that all possible users of xenstat_get_node() would want to have the call take an arbitrary amount of time as it struggles to get a consistent snapshot -- better to let the caller decide policy on retrying the call. 2. xentop is currently the only user of xenstat_get_node in the tree and the fix in xentop was waaay easier ;-) If you still think xenstat_get_node() should loop until it has a consistent snapshot then I''ll redo the patch that way /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-25 18:48 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 25 Jul 2006, at 19:40, Graham, Simon wrote:> 1. It''s not clear that all possible users of xenstat_get_node() would > want > to have the call take an arbitrary amount of time as it struggles to > get a consistent snapshot -- better to let the caller decide policy > on retrying the call.At the moment the caller cannot even make that choice as the function does not return specific error codes. If it, for example, returned a negative errno then the caller could check -EAGAIN. Anyhow, clearly the bulk of the patch belongs in libxenstat -- the only question is whether we retry internally or in the caller. I think the former is simpler as it does not change the function''s specification. -- Keir> 2. xentop is currently the only user of xenstat_get_node in the tree > and > the > fix in xentop was waaay easier ;-) > > If you still think xenstat_get_node() should loop until it has a > consistent snapshot > then I''ll redo the patch that way_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Jul-26 20:36 UTC
RE: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
OK - I''ve reworked the fix to put it in libxenstat -- still not completely convinced I like it, but take a look and let me know what you think - as you suggested, I''ve made the collectors return a value indicating if a fatal error occurred (-ve), a retryable error occurred (0) or they were successful (+ve) and put in code to retry from the top when a retryable error occurs (with a small 1/4s delay so we don''t spin wildly whilst things stabilize). Simon ----------------------------------------------------------- Currently, xenstat_get_node exits with a failure if a VM is deleted between the time it gets the list of VMs and the time it queries xenstore for it''s name and other parameters. This patch modifies xenstat_get_node to retry the process from scratch if a recoverable error occurs. Note that the patch supplied is against 3.0.testing although I think unstable is basically the same. Signed-off-by: Simon Graham Simon.Graham@stratus.com ----------------------------------------------------------- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-27 09:26 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 26 Jul 2006, at 21:36, Graham, Simon wrote:> OK - I''ve reworked the fix to put it in libxenstat -- still not > completely convinced I like it, but take a look and let me know what > you > think - as you suggested, I''ve made the collectors return a value > indicating if a fatal error occurred (-ve), a retryable error occurred > (0) or they were successful (+ve) and put in code to retry from the top > when a retryable error occurs (with a small 1/4s delay so we don''t spin > wildly whilst things stabilize).Thinking about this some more, those retryable failures will generally mean that a domain is being created or being destroyed. In those two cases, perhaps xenstat_get_node() should simply prune the problematic domain from the list it returns? That would avoid unbounded delay in xenstat_get_node(). I think what you have so far is okay: fatal error in a collector causes error in the caller; recoverable error could cause domain to be pruned rather than retrying in the caller. Maybe we should have macros for the possible return values from a collector: -1/0/+1 return values are not immediately obvious. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel