Graham, Simon
2006-Jul-14  19:00 UTC
[Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
Currently, xentop exits with a failure if xenstat_get_node() returns
NULL - this can happen if a VM is deleted between the time
xenstat_get_node() gets the list of VMs and the time it attempts to
collect information about the VMs. This patch modifies xentop to simply
ignore a NULL return and loop around to try the call again.
Note that the patch supplied is against 3.0.testing although I think
unstable is basically the same.
Signed-off-by: Simon Graham Simon.Graham@stratus.com
---
Index: xentop.c
==================================================================--- xentop.c
(revision 3098)
+++ xentop.c	(working copy)
@@ -762,14 +762,18 @@
 {
 	xenstat_domain **domains;
 	unsigned int i, num_domains = 0;
+        xenstat_node *new_node = xenstat_get_node(xhandle,
XENSTAT_ALL);
 
-	/* Now get the node information */
+        if (new_node == NULL) {
+		// This can happen if domains change during call - just
+		// return and try again
+		return;
+        }
+
 	if (prev_node != NULL)
 		xenstat_free_node(prev_node);
 	prev_node = cur_node;
-	cur_node = xenstat_get_node(xhandle, XENSTAT_ALL);
-	if (cur_node == NULL)
-		fail("Failed to retrieve statistics from libxenstat\n");
+	cur_node = new_node;
 
 	/* dump summary top information */
 	do_summary();
@@ -871,8 +875,10 @@
 		if(ch != ERR || (curtime.tv_sec - oldtime.tv_sec) >delay) {
 			clear();
 			top();
-			oldtime = curtime;
-			refresh();
+                        if (cur_node != NULL) {
+				oldtime = curtime;
+				refresh();
+                        }
 		}
 		ch = getch();
 	} while (handle_key(ch));
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-25  15:21 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 14 Jul 2006, at 20:00, Graham, Simon wrote:> Currently, xentop exits with a failure if xenstat_get_node() returns > NULL - this can happen if a VM is deleted between the time > xenstat_get_node() gets the list of VMs and the time it attempts to > collect information about the VMs. This patch modifies xentop to simply > ignore a NULL return and loop around to try the call again. > > Note that the patch supplied is against 3.0.testing although I think > unstable is basically the same. > > Signed-off-by: Simon Graham Simon.Graham@stratus.comlibxenstat should be fixed so that xenstat_get_node() does not spuriously fail in this way. This could be done by getting the ''collectors'' it calls to return better error info -- we can use EAGAIN to cause xenstat_get_node() to rerun itself from scratch rather than returning failure to the caller. Otherwise we''re going to need to fix every user of libxenstat, which is stupid. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Jul-25  18:40 UTC
RE: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
> > libxenstat should be fixed so that xenstat_get_node() does not > spuriously fail in this way. This could be done by getting the > ''collectors'' it calls to return better error info -- we can use EAGAIN > to cause xenstat_get_node() to rerun itself from scratch rather than > returning failure to the caller. Otherwise we''re going to need to fix > every user of libxenstat, which is stupid. > > -- KeirI thought about this too but decided not to for a couple of reasons: 1. It''s not clear that all possible users of xenstat_get_node() would want to have the call take an arbitrary amount of time as it struggles to get a consistent snapshot -- better to let the caller decide policy on retrying the call. 2. xentop is currently the only user of xenstat_get_node in the tree and the fix in xentop was waaay easier ;-) If you still think xenstat_get_node() should loop until it has a consistent snapshot then I''ll redo the patch that way /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-25  18:48 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 25 Jul 2006, at 19:40, Graham, Simon wrote:> 1. It''s not clear that all possible users of xenstat_get_node() would > want > to have the call take an arbitrary amount of time as it struggles to > get a consistent snapshot -- better to let the caller decide policy > on retrying the call.At the moment the caller cannot even make that choice as the function does not return specific error codes. If it, for example, returned a negative errno then the caller could check -EAGAIN. Anyhow, clearly the bulk of the patch belongs in libxenstat -- the only question is whether we retry internally or in the caller. I think the former is simpler as it does not change the function''s specification. -- Keir> 2. xentop is currently the only user of xenstat_get_node in the tree > and > the > fix in xentop was waaay easier ;-) > > If you still think xenstat_get_node() should loop until it has a > consistent snapshot > then I''ll redo the patch that way_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2006-Jul-26  20:36 UTC
RE: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
OK - I''ve reworked the fix to put it in libxenstat -- still not completely convinced I like it, but take a look and let me know what you think - as you suggested, I''ve made the collectors return a value indicating if a fatal error occurred (-ve), a retryable error occurred (0) or they were successful (+ve) and put in code to retry from the top when a retryable error occurs (with a small 1/4s delay so we don''t spin wildly whilst things stabilize). Simon ----------------------------------------------------------- Currently, xenstat_get_node exits with a failure if a VM is deleted between the time it gets the list of VMs and the time it queries xenstore for it''s name and other parameters. This patch modifies xenstat_get_node to retry the process from scratch if a recoverable error occurs. Note that the patch supplied is against 3.0.testing although I think unstable is basically the same. Signed-off-by: Simon Graham Simon.Graham@stratus.com ----------------------------------------------------------- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-27  09:26 UTC
Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
On 26 Jul 2006, at 21:36, Graham, Simon wrote:> OK - I''ve reworked the fix to put it in libxenstat -- still not > completely convinced I like it, but take a look and let me know what > you > think - as you suggested, I''ve made the collectors return a value > indicating if a fatal error occurred (-ve), a retryable error occurred > (0) or they were successful (+ve) and put in code to retry from the top > when a retryable error occurs (with a small 1/4s delay so we don''t spin > wildly whilst things stabilize).Thinking about this some more, those retryable failures will generally mean that a domain is being created or being destroyed. In those two cases, perhaps xenstat_get_node() should simply prune the problematic domain from the list it returns? That would avoid unbounded delay in xenstat_get_node(). I think what you have so far is okay: fatal error in a collector causes error in the caller; recoverable error could cause domain to be pruned rather than retrying in the caller. Maybe we should have macros for the possible return values from a collector: -1/0/+1 return values are not immediately obvious. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel