flight 11660 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/11660/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
 test-i386-i386-xl            18 leak-check/check          fail REGR. vs. 11643
 test-amd64-i386-xl-credit2   18 leak-check/check          fail REGR. vs. 11643
 test-amd64-i386-xl-multivcpu 18 leak-check/check          fail REGR. vs. 11643
 test-amd64-i386-xl           18 leak-check/check          fail REGR. vs. 11643
 test-amd64-amd64-xl          18 leak-check/check          fail REGR. vs. 11643
Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-sedf     18 leak-check/check          fail REGR. vs. 11643
 test-amd64-amd64-xl-sedf-pin 18 leak-check/check         fail blocked in 11643
 test-i386-i386-win           14 guest-start.2                fail   like 11643
Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pcipt-intel  9 guest-start                 fail never pass
 test-amd64-i386-rhel6hvm-intel 11 leak-check/check             fail never pass
 test-amd64-i386-rhel6hvm-amd 11 leak-check/check             fail   never pass
 test-amd64-amd64-xl-win7-amd64 13 guest-stop                   fail never pass
 test-amd64-i386-xend-winxpsp3 16 leak-check/check             fail  never pass
 test-amd64-amd64-xl-win      13 guest-stop                   fail   never pass
 test-i386-i386-xl-win        13 guest-stop                   fail   never pass
 test-amd64-i386-xl-win-vcpus1 13 guest-stop                   fail  never pass
 test-amd64-i386-win-vcpus1   16 leak-check/check             fail   never pass
 test-amd64-i386-win          16 leak-check/check             fail   never pass
 test-amd64-i386-xl-win7-amd64 13 guest-stop                   fail  never pass
 test-i386-i386-xl-winxpsp3   13 guest-stop                   fail   never pass
 test-amd64-amd64-win         16 leak-check/check             fail   never pass
 test-amd64-amd64-xl-winxpsp3 13 guest-stop                   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop               fail never pass
version targeted for testing:
 xen                  54000bca7a6a
baseline version:
 xen                  e2722b24dc09
------------------------------------------------------------
People who touched revisions under test:
  Adin Scannell <adin@scannell.ca>
  Andres Lagar-Cavilla <andres@lagarcavilla.org>
  Dario Faggioli <dario.faggioli@citrix.com>
  Ian Campbell <ian.campbell@citrix.com>
  Ian Jackson <ian.jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Jim Fehlig <jfehlig@suse.com>
  Olaf Hering <olaf@aepfle.de>
  Roger Pau Monne <roger.pau@entel.upc.edu>
  Stefano Stabellini <stefano.stabellini@eu.citrix.com>
------------------------------------------------------------
jobs:
 build-amd64                                                  pass    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-amd64-xl                                          fail    
 test-amd64-i386-xl                                           fail    
 test-i386-i386-xl                                            fail    
 test-amd64-i386-rhel6hvm-amd                                 fail    
 test-amd64-amd64-xl-win7-amd64                               fail    
 test-amd64-i386-xl-win7-amd64                                fail    
 test-amd64-i386-xl-credit2                                   fail    
 test-amd64-amd64-xl-pcipt-intel                              fail    
 test-amd64-i386-rhel6hvm-intel                               fail    
 test-amd64-i386-xl-multivcpu                                 fail    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-i386-i386-pair                                          pass    
 test-amd64-amd64-xl-sedf-pin                                 fail    
 test-amd64-amd64-pv                                          pass    
 test-amd64-i386-pv                                           pass    
 test-i386-i386-pv                                            pass    
 test-amd64-amd64-xl-sedf                                     fail    
 test-amd64-i386-win-vcpus1                                   fail    
 test-amd64-i386-xl-win-vcpus1                                fail    
 test-amd64-i386-xl-winxpsp3-vcpus1                           fail    
 test-amd64-amd64-win                                         fail    
 test-amd64-i386-win                                          fail    
 test-i386-i386-win                                           fail    
 test-amd64-amd64-xl-win                                      fail    
 test-i386-i386-xl-win                                        fail    
 test-amd64-i386-xend-winxpsp3                                fail    
 test-amd64-amd64-xl-winxpsp3                                 fail    
 test-i386-i386-xl-winxpsp3                                   fail    
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 384 lines long.)
xen.org writes ("[xen-unstable test] 11660: regressions -
FAIL"):> flight 11660 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11660/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-i386-i386-xl            18 leak-check/check      fail REGR. vs. 11643
This is mostly my fault and I''m about to post a two-patch
series to fix it.
Ian.
Ian Jackson
2012-Jan-30  14:01 UTC
[PATCH 1/2] libxl: domain_death_xswatch_callback: add some debug logging
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl.c |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fa358d1..2758d4c 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -713,15 +713,28 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
         }
         gotend = &domaininfos[rc];
 
+        LIBXL__LOG(CTX, LIBXL__LOG_DEBUG,
"[evg=%p:%"PRIu32"]"
+                   " from domid=%"PRIu32" nentries=%d
rc=%d",
+                   evg, evg->domid, domid, nentries, rc);
+
         for (;;) {
-            if (!evg)
+            if (!evg) {
+                LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "[evg=0] all
reported");
                 goto all_reported;
+            }
+
+            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG,
"[evg=%p:%"PRIu32"]"
+                       "   got=domaininfos[%d] got->domain=%ld",
+                       evg, evg->domid, (int)(got - domaininfos),
+                       got < gotend ? (long)got->domain : -1L);
 
             if (!rc || got->domain > evg->domid) {
                 /* ie, the list doesn''t contain evg->domid any more
so
                  * the domain has been destroyed */
                 libxl_evgen_domain_death *evg_next;
 
+                LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " destroyed");
+
                 libxl_event *ev = NEW_EVENT(egc, DOMAIN_DESTROY,
evg->domid);
                 if (!ev) goto out;
 
@@ -736,8 +749,10 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
                 continue;
             }
             
-            if (got == gotend)
+            if (got == gotend) {
+                LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " got==gotend");
                 break;
+            }
 
             if (got->domain < evg->domid) {
                 got++;
@@ -745,12 +760,17 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
             }
 
             assert(evg->domid == got->domain);
+            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " exists
shutdown_reported=%d"
+                       " dominf.flags=%x",
+                       evg->shutdown_reported, got->flags);
 
             if (!evg->shutdown_reported &&
                 (got->flags & XEN_DOMINF_shutdown)) {
                 libxl_event *ev = NEW_EVENT(egc, DOMAIN_SHUTDOWN,
got->domain);
                 if (!ev) goto out;
                 
+                LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " shutdown
reporting");
+
                 ev->u.domain_shutdown.shutdown_reason                     
(got->flags >> XEN_DOMINF_shutdownshift) &
                     XEN_DOMINF_shutdownmask;
@@ -767,6 +787,8 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
  all_reported:
  out:
 
+    LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "domain death search done");
+
     CTX_UNLOCK;
 }
 
-- 
1.7.2.5
Rename the DOMAIN_DESTROY event to DOMAIN_DEATH and have it trigger
when the domain goes into the state indicated by the domaininfo flag
"dying".
This fixes a race which could leak a daemonised xl process, which
would have ignored the domain becoming "dying" and would then wait
forever to be told the domain was destroyed.
After the domain becomes "dying" we can''t generate an event
when it is
actually destroyed because xenstored will eat the relevant
VIRT_DOM_EXC virq and not generate an @releaseDomain, since xenstored
discards its own record of the domain''s existence as soon as it sees
the domain "dying" and will not trigger @releaseDomain watches for
domains it knows nothing about.  Arguably this is a bug in xenstored,
and the whole @releaseDomain machinery is rather poor, but let us not
fix that now.
Anyway, xl does not really want to know when the domain is ultimately
destroyed.  It is enough for xl to know that it is on the way out, in
the "dying" state (which leads later to destruction by Xen).
Also fix a bug where domain_death_xswatch_callback might read one
domain beyond the valid data in its domaininfos array, by correctly
ordering the checks for empty domain list, end of domain list, and our
domain being missing.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl.c         |   57 ++++++++++++++++++++++++++++--------------
 tools/libxl/libxl_types.idl |    4 +-
 tools/libxl/xl_cmdimpl.c    |    4 +-
 3 files changed, 42 insertions(+), 23 deletions(-)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2758d4c..b74a4d9 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -685,6 +685,29 @@ int libxl_domain_reboot(libxl_ctx *ctx, uint32_t domid)
     return ret;
 }
 
+static void domain_death_occurred(libxl__egc *egc,
+                                  libxl_evgen_domain_death **evg_upd,
+                                  const char *why) {
+    /* Removes **evg from the list and advances *evg_upd to the next
+     * entry.  Call sites in _xswatch_callback must use "continue".
*/
+    EGC_GC;
+    libxl_evgen_domain_death *const evg = *evg_upd;
+
+    LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " death %s", why);
+
+    libxl_evgen_domain_death *evg_next = LIBXL_TAILQ_NEXT(evg, entry);
+    *evg_upd = evg_next;
+
+    libxl_event *ev = NEW_EVENT(egc, DOMAIN_DEATH, evg->domid);
+    if (!ev) return;
+
+    libxl__event_occurred(egc, ev);
+
+    evg->death_reported = 1;
+    LIBXL_TAILQ_REMOVE(&CTX->death_list, evg, entry);
+    LIBXL_TAILQ_INSERT_HEAD(&CTX->death_reported, evg, entry);
+}
+
 static void domain_death_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch
*w,
                                         const char *wpath, const char *epath) {
     EGC_GC;
@@ -728,32 +751,23 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
                        evg, evg->domid, (int)(got - domaininfos),
                        got < gotend ? (long)got->domain : -1L);
 
-            if (!rc || got->domain > evg->domid) {
-                /* ie, the list doesn''t contain evg->domid any more
so
-                 * the domain has been destroyed */
-                libxl_evgen_domain_death *evg_next;
-
-                LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " destroyed");
-
-                libxl_event *ev = NEW_EVENT(egc, DOMAIN_DESTROY,
evg->domid);
-                if (!ev) goto out;
-
-                libxl__event_occurred(egc, ev);
-
-                evg->death_reported = 1;
-                evg_next = LIBXL_TAILQ_NEXT(evg, entry);
-                LIBXL_TAILQ_REMOVE(&CTX->death_list, evg, entry);
-                LIBXL_TAILQ_INSERT_HEAD(&CTX->death_reported, evg,
entry);
-                evg = evg_next;
-
+            if (!rc) {
+                domain_death_occurred(egc, &evg, "empty list");
                 continue;
             }
-            
+
             if (got == gotend) {
                 LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " got==gotend");
                 break;
             }
 
+            if (got->domain > evg->domid) {
+                /* ie, the list doesn''t contain evg->domid any more
so
+                 * the domain has been destroyed */
+                domain_death_occurred(egc, &evg, "missing from
list");
+                continue;
+            }
+
             if (got->domain < evg->domid) {
                 got++;
                 continue;
@@ -764,6 +778,11 @@ static void domain_death_xswatch_callback(libxl__egc *egc,
libxl__ev_xswatch *w,
                        " dominf.flags=%x",
                        evg->shutdown_reported, got->flags);
 
+            if (got->flags & XEN_DOMINF_dying) {
+                domain_death_occurred(egc, &evg, "dying");
+                continue;
+            }
+
             if (!evg->shutdown_reported &&
                 (got->flags & XEN_DOMINF_shutdown)) {
                 libxl_event *ev = NEW_EVENT(egc, DOMAIN_SHUTDOWN,
got->domain);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f438c9f..5492ce9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -396,7 +396,7 @@ libxl_sched_sedf = Struct("sched_sedf", [
 
 libxl_event_type = Enumeration("event_type", [
     (1, "DOMAIN_SHUTDOWN"),
-    (2, "DOMAIN_DESTROY"),
+    (2, "DOMAIN_DEATH"),
     (3, "DISK_EJECT"),
     (4, "OPERATION_COMPLETE"),
     ])
@@ -417,7 +417,7 @@ libxl_event = Struct("event",[
           [("domain_shutdown", Struct(None, [
                                              ("shutdown_reason",
uint8),
                                       ])),
-           ("domain_destroy", Struct(None, [])),
+           ("domain_death", Struct(None, [])),
            ("disk_eject", Struct(None, [
                                         ("vdev", string),
                                         ("disk", libxl_device_disk),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8832b5a..d326588 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1909,7 +1909,7 @@ start:
                 abort();
             }
 
-        case LIBXL_EVENT_TYPE_DOMAIN_DESTROY:
+        case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %d has been destroyed.", domid);
             ret = 0;
             goto out;
@@ -2445,7 +2445,7 @@ static void shutdown_domain(const char *p, int wait)
 
             switch (event->type) {
 
-            case LIBXL_EVENT_TYPE_DOMAIN_DESTROY:
+            case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
                 LOG("Domain %d has been destroyed", domid);
                 goto done;
 
-- 
1.7.2.5
Ian Campbell
2012-Jan-30  14:50 UTC
Re: [PATCH 1/2] libxl: domain_death_xswatch_callback: add some debug logging
On Mon, 2012-01-30 at 14:01 +0000, Ian Jackson wrote:> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Campbell
2012-Jan-30  15:02 UTC
Re: [PATCH 2/2] libxl: treat "dying" domains as destroyed
On Mon, 2012-01-30 at 14:01 +0000, Ian Jackson wrote:> Rename the DOMAIN_DESTROY event to DOMAIN_DEATH and have it trigger > when the domain goes into the state indicated by the domaininfo flag > "dying". > > This fixes a race which could leak a daemonised xl process, which > would have ignored the domain becoming "dying" and would then wait > forever to be told the domain was destroyed. > > After the domain becomes "dying" we can''t generate an event when it is > actually destroyed because xenstored will eat the relevant > VIRT_DOM_EXC virq and not generate an @releaseDomain, since xenstored > discards its own record of the domain''s existence as soon as it sees > the domain "dying" and will not trigger @releaseDomain watches for > domains it knows nothing about. Arguably this is a bug in xenstored, > and the whole @releaseDomain machinery is rather poor, but let us not > fix that now. > > Anyway, xl does not really want to know when the domain is ultimately > destroyed. It is enough for xl to know that it is on the way out, in > the "dying" state (which leads later to destruction by Xen). > > Also fix a bug where domain_death_xswatch_callback might read one > domain beyond the valid data in its domaininfos array, by correctly > ordering the checks for empty domain list, end of domain list, and our > domain being missing. > > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>This seems to do what I would expect, I tried with "xl reboot", "xl shutdown" & "xl destroy" as well as in-guest shutdown (which is no different in reality from "xl shutdown"). No leaks and in each case it rebooted, destroyed etc the domain as expected.> --- > tools/libxl/libxl.c | 57 ++++++++++++++++++++++++++++-------------- > tools/libxl/libxl_types.idl | 4 +- > tools/libxl/xl_cmdimpl.c | 4 +- > 3 files changed, 42 insertions(+), 23 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 2758d4c..b74a4d9 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -685,6 +685,29 @@ int libxl_domain_reboot(libxl_ctx *ctx, uint32_t domid) > return ret; > } > > +static void domain_death_occurred(libxl__egc *egc, > + libxl_evgen_domain_death **evg_upd, > + const char *why) { > + /* Removes **evg from the list and advances *evg_upd to the next > + * entry. Call sites in _xswatch_callback must use "continue". */There''s no **evg in this context? ITYM *evg or just evg? Also it''s not clear which list this talks about since there is death_list and death_reported as well as, presumably, a list of current domain infos somewhere. Did you mean "moves evg from death_list to death_reported list and advances *evg_upd ...."?> + EGC_GC; > + libxl_evgen_domain_death *const evg = *evg_upd; > + > + LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " death %s", why);LIBXL__LOG puts a space between the boilerplate and the message so you end up with two spaces before "death" here. Also the resultant messages are: " death destroyed" " death missing" " death dying" in context that is: libxl: debug: libxl.c:696:domain_death_occurred: death dying Which could do with a noun or something. Or, maybe you meant s/death/domain/? Other than those small nits: Acked-/Tested-by: Ian Campbell <ian.campbell@citrix.com> Ian.
Ian Jackson
2012-Jan-30  15:19 UTC
Re: [PATCH 2/2] libxl: treat "dying" domains as destroyed
Ian Campbell writes ("Re: [Xen-devel] [PATCH 2/2] libxl: treat
"dying" domains as destroyed"):> This seems to do what I would expect, I tried with "xl reboot",
"xl
> shutdown" & "xl destroy" as well as in-guest shutdown
(which is no
> different in reality from "xl shutdown"). No leaks and in each
case it
> rebooted, destroyed etc the domain as expected.
Yes, great, thanks.
> > +static void domain_death_occurred(libxl__egc *egc,
> > +                                  libxl_evgen_domain_death **evg_upd,
> > +                                  const char *why) {
> > +    /* Removes **evg from the list and advances *evg_upd to the next
> > +     * entry.  Call sites in _xswatch_callback must use
"continue". */
> 
> There''s no **evg in this context? ITYM *evg or just evg?
I meant **evg_upd.  The actual struct is removed from the list, after
all.
> Also it''s not clear which list this talks about since there is
> death_list and death_reported as well as, presumably, a list of current
> domain infos somewhere. Did you mean "moves evg from death_list to
> death_reported list and advances *evg_upd ...."?
Yes.
> > +    EGC_GC;
> > +    libxl_evgen_domain_death *const evg = *evg_upd;
> > +
> > +    LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " death %s", why);
> 
> LIBXL__LOG puts a space between the boilerplate and the message so you
> end up with two spaces before "death" here.
> 
> Also the resultant messages are:
> 	" death destroyed"
> 	" death missing"
> 	" death dying"
> in context that is:
>         libxl: debug: libxl.c:696:domain_death_occurred:  death dying
> 
> Which could do with a noun or something. 
I''ll remove the word "death" (and the extra space).
Thanks,
Ian.
Ian Jackson
2012-Jan-30  15:24 UTC
Re: [PATCH 2/2] libxl: treat "dying" domains as destroyed
Ian Campbell writes ("Re: [Xen-devel] [PATCH 2/2] libxl: treat
"dying" domains as destroyed"):> Other than those small nits:
> Acked-/Tested-by: Ian Campbell <ian.campbell@citrix.com>
Thanks; having fixed those I''ve applied both.
Ian.