flight 11574 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/11574/
Failures :-/ but no regressions.
Tests which are failing intermittently (not blocking):
 test-amd64-i386-win           7 windows-install             fail pass in 11561
 test-i386-i386-win            7 windows-install             fail pass in 11561
 test-amd64-i386-xend-winxpsp3  7 windows-install   fail in 11561 pass in 11574
 test-amd64-amd64-xl-win       7 windows-install    fail in 11561 pass in 11574
 test-amd64-amd64-win          7 windows-install    fail in 11561 pass in 11574
Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-win-vcpus1    7 windows-install              fail   like 11561
Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pcipt-intel  9 guest-start                 fail never pass
 test-amd64-i386-rhel6hvm-intel  9 guest-start.2                fail never pass
 test-amd64-i386-rhel6hvm-amd  9 guest-start.2                fail   never pass
 test-amd64-amd64-xl-win7-amd64 13 guest-stop                   fail never pass
 test-amd64-i386-xend-winxpsp3 16 leak-check/check             fail  never pass
 test-amd64-amd64-xl-win      13 guest-stop                   fail   never pass
 test-i386-i386-xl-win        13 guest-stop                   fail   never pass
 test-amd64-i386-xl-win-vcpus1 13 guest-stop                   fail  never pass
 test-amd64-i386-xl-win7-amd64 13 guest-stop                   fail  never pass
 test-i386-i386-xl-winxpsp3   13 guest-stop                   fail   never pass
 test-amd64-amd64-win         16 leak-check/check             fail   never pass
 test-amd64-amd64-xl-winxpsp3 13 guest-stop                   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop               fail never pass
 test-amd64-i386-win          16 leak-check/check      fail in 11561 never pass
 test-i386-i386-win           16 leak-check/check      fail in 11561 never pass
version targeted for testing:
 xen                  370924e204dc
baseline version:
 xen                  370924e204dc
jobs:
 build-amd64                                                  pass    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-amd64-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-i386-i386-xl                                            pass    
 test-amd64-i386-rhel6hvm-amd                                 fail    
 test-amd64-amd64-xl-win7-amd64                               fail    
 test-amd64-i386-xl-win7-amd64                                fail    
 test-amd64-i386-xl-credit2                                   pass    
 test-amd64-amd64-xl-pcipt-intel                              fail    
 test-amd64-i386-rhel6hvm-intel                               fail    
 test-amd64-i386-xl-multivcpu                                 pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-i386-i386-pair                                          pass    
 test-amd64-amd64-xl-sedf-pin                                 pass    
 test-amd64-amd64-pv                                          pass    
 test-amd64-i386-pv                                           pass    
 test-i386-i386-pv                                            pass    
 test-amd64-amd64-xl-sedf                                     pass    
 test-amd64-i386-win-vcpus1                                   fail    
 test-amd64-i386-xl-win-vcpus1                                fail    
 test-amd64-i386-xl-winxpsp3-vcpus1                           fail    
 test-amd64-amd64-win                                         fail    
 test-amd64-i386-win                                          fail    
 test-i386-i386-win                                           fail    
 test-amd64-amd64-xl-win                                      fail    
 test-i386-i386-xl-win                                        fail    
 test-amd64-i386-xend-winxpsp3                                fail    
 test-amd64-amd64-xl-winxpsp3                                 fail    
 test-i386-i386-xl-winxpsp3                                   fail    
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Published tested tree is already up to date.
xen.org writes ("[xen-unstable test] 11574: tolerable
FAIL"):> Tests which are failing intermittently (not blocking):
>  test-amd64-i386-win           7 windows-install             fail pass in
11561
This is a host-specific, but deterministic, failure.  My bisector has
been working on it (the basis pass was in November so there has been a
lot of ground to cover and I had to make some new arrangements to be
able to run an ad-hoc bisection of this problem) and the results so
far are fingering changesets between 24367:537ceb11d51e (bad) and
24362:d35bedf334f2 (good).
Extract from hg log -v is below.
I have looked at the logs of one of these failures and there is
nothing of note.  The screenshot shows the Windows screensaver, so I
guess neither the guest itself nor qemu have crashed.  The likely
situation is either that the guest has somehow locked up and ceased to
make progress, or that the networking is nonfunctional.
I''m expecting the bisector to finger a specific changeset and will let
you know when it does, but this will take some more hours and maybe
only finish overnight or tomorrow.
Ian.
changeset:   24367:537ceb11d51e
user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
date:        Tue Dec 06 20:31:49 2011 +0000
files:       xen/arch/x86/hvm/hvm.c
description:
Improve handling of nested page faults
Add checks for access type. Be less reliant on implicit semantics.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset:   24366:534b2a15e669
user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
date:        Tue Dec 06 20:10:32 2011 +0000
files:       xen/arch/x86/mm/mem_sharing.c xen/arch/x86/mm/p2m.c
xen/include/public/mem_event.h
description:
x86/mm: Allow dummy responses on the mem_event ring.
Ring semantics require that for every request, a response be put. This
allows consumer to place a dummy response if need be.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset:   24365:c65d1a9769b4
user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
date:        Tue Dec 06 20:10:32 2011 +0000
files:       xen/arch/x86/mm/mem_event.c xen/arch/x86/mm/mem_sharing.c
xen/arch/x86/mm/p2m.c xen/include/asm-x86/mem_event.h
description:
x86/mm: Consume multiple mem event responses off the ring
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scanneel.ca>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset:   24364:0964341efd65
user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
date:        Tue Dec 06 20:10:32 2011 +0000
files:       xen/arch/x86/mm/mem_event.c
description:
x86/mm: Allow memevent responses to be signaled via the event channel
Don''t require a separate domctl to notify the memevent interface that
an event
has occured.  This domctl can be taxing, particularly when you are scaling
events and paging to many domains across a single system.  Instead, we use the
existing event channel to signal when we place something in the ring (as per
normal ring operation).
Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset:   24363:1620291f0c4a
user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
date:        Tue Dec 06 20:10:32 2011 +0000
files:       xen/arch/ia64/vmx/vmx_init.c xen/arch/x86/hvm/hvm.c
xen/arch/x86/mm/mem_event.c xen/common/event_channel.c xen/include/xen/event.h
xen/include/xen/sched.h
description:
Create a generic callback mechanism for  Xen-bound event channels
For event channels for which Xen is the consumer, there currently is
a single action. With this patch, we allow event channel creators to
specify a generic callback (or no callback). Because the expectation
is that there will be few callbacks, they are stored in a small table.
Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Committed-by: Tim Deegan <tim@xen.org>
Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable
FAIL"):> This is a host-specific, but deterministic, failure.  My bisector has
> been working on it (the basis pass was in November so there has been a
> lot of ground to cover and I had to make some new arrangements to be
> able to run an ad-hoc bisection of this problem) and the results so
> far are fingering changesets between 24367:537ceb11d51e (bad) and
> 24362:d35bedf334f2 (good).
The bisector is now minded to blame this changeset:
  changeset:   24367:537ceb11d51e
  user:        Andres Lagar-Cavilla <andres@lagarcavilla.org>
  date:        Tue Dec 06 20:31:49 2011 +0000
  files:       xen/arch/x86/hvm/hvm.c
  description:
  Improve handling of nested page faults
  Add checks for access type. Be less reliant on implicit semantics.
  Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
  Acked-by: Tim Deegan <tim@xen.org>
  Committed-by: Tim Deegan <tim@xen.org>
It''s doing a couple more before-and-after repros to be sure, but this
looks like a plausible candidate.  Full diff below.
I''m not an expert on this area.  Perhaps someone could speculate what
Windows is doing that is triggering one of these new checks ?
Ian.
# HG changeset patch
# User Andres Lagar-Cavilla <andres@lagarcavilla.org>
# Date 1323203509 0
# Node ID 537ceb11d51ef60cd4abffd2f54de0ae0ca50008
# Parent  534b2a15e6695dfd8866c0ff626b002cbf57991a
Improve handling of nested page faults
Add checks for access type. Be less reliant on implicit semantics.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
diff -r 534b2a15e669 -r 537ceb11d51e xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Tue Dec 06 20:10:32 2011 +0000
+++ b/xen/arch/x86/hvm/hvm.c	Tue Dec 06 20:31:49 2011 +0000
@@ -1288,7 +1288,8 @@ int hvm_hap_nested_page_fault(unsigned l
      * If this GFN is emulated MMIO or marked as read-only, pass the fault
      * to the mmio handler.
      */
-    if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) )
+    if ( (p2mt == p2m_mmio_dm) || 
+         (access_w && (p2mt == p2m_ram_ro)) )
     {
         if ( !handle_mmio() )
             hvm_inject_exception(TRAP_gp_fault, 0, 0);
@@ -1302,7 +1303,7 @@ int hvm_hap_nested_page_fault(unsigned l
         p2m_mem_paging_populate(v->domain, gfn);
 
     /* Mem sharing: unshare the page and try again */
-    if ( p2mt == p2m_ram_shared )
+    if ( access_w && (p2mt == p2m_ram_shared) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
         mem_sharing_unshare_page(p2m->domain, gfn, 0);
@@ -1319,14 +1320,17 @@ int hvm_hap_nested_page_fault(unsigned l
          * a large page, we do not change other pages type within that large
          * page.
          */
-        paging_mark_dirty(v->domain, mfn_x(mfn));
-        p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+        if ( access_w )
+        {
+            paging_mark_dirty(v->domain, mfn_x(mfn));
+            p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+        }
         rc = 1;
         goto out_put_gfn;
     }
 
     /* Shouldn''t happen: Maybe the guest was writing to a r/o grant
mapping? */
-    if ( p2mt == p2m_grant_map_ro )
+    if ( access_w && (p2mt == p2m_grant_map_ro) )
     {
         gdprintk(XENLOG_WARNING,
                  "trying to write to read-only grant mapping\n");
> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): >> This is a host-specific, but deterministic, failure. My bisector has >> been working on it (the basis pass was in November so there has been a >> lot of ground to cover and I had to make some new arrangements to be >> able to run an ad-hoc bisection of this problem) and the results so >> far are fingering changesets between 24367:537ceb11d51e (bad) and >> 24362:d35bedf334f2 (good). > > The bisector is now minded to blame this changeset: > > changeset: 24367:537ceb11d51e > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > date: Tue Dec 06 20:31:49 2011 +0000 > files: xen/arch/x86/hvm/hvm.c > description: > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org> > > It''s doing a couple more before-and-after repros to be sure, but this > looks like a plausible candidate. Full diff below.Thanks Ian, just to be sure, none of the following conditions apply on your test: - hvm uses memory sharing - hvm runs a backend pv driver - hvm uses populate-on-demand. If all the above apply, I can quickly put together a micro-patch to assert that the deal-breaker is> - if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) ) > + if ( (p2mt == p2m_mmio_dm) || > + (access_w && (p2mt == p2m_ram_ro)) ) >Would you be able to test that? Thanks again Andres> I''m not an expert on this area. Perhaps someone could speculate what > Windows is doing that is triggering one of these new checks ? > > Ian. > > # HG changeset patch > # User Andres Lagar-Cavilla <andres@lagarcavilla.org> > # Date 1323203509 0 > # Node ID 537ceb11d51ef60cd4abffd2f54de0ae0ca50008 > # Parent 534b2a15e6695dfd8866c0ff626b002cbf57991a > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org> > > diff -r 534b2a15e669 -r 537ceb11d51e xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:10:32 2011 +0000 > +++ b/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:31:49 2011 +0000 > @@ -1288,7 +1288,8 @@ int hvm_hap_nested_page_fault(unsigned l > * If this GFN is emulated MMIO or marked as read-only, pass the > fault > * to the mmio handler. > */ > - if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) ) > + if ( (p2mt == p2m_mmio_dm) || > + (access_w && (p2mt == p2m_ram_ro)) ) > { > if ( !handle_mmio() ) > hvm_inject_exception(TRAP_gp_fault, 0, 0); > @@ -1302,7 +1303,7 @@ int hvm_hap_nested_page_fault(unsigned l > p2m_mem_paging_populate(v->domain, gfn); > > /* Mem sharing: unshare the page and try again */ > - if ( p2mt == p2m_ram_shared ) > + if ( access_w && (p2mt == p2m_ram_shared) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > mem_sharing_unshare_page(p2m->domain, gfn, 0); > @@ -1319,14 +1320,17 @@ int hvm_hap_nested_page_fault(unsigned l > * a large page, we do not change other pages type within that > large > * page. > */ > - paging_mark_dirty(v->domain, mfn_x(mfn)); > - p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw); > + if ( access_w ) > + { > + paging_mark_dirty(v->domain, mfn_x(mfn)); > + p2m_change_type(v->domain, gfn, p2m_ram_logdirty, > p2m_ram_rw); > + } > rc = 1; > goto out_put_gfn; > } > > /* Shouldn''t happen: Maybe the guest was writing to a r/o grant > mapping? */ > - if ( p2mt == p2m_grant_map_ro ) > + if ( access_w && (p2mt == p2m_grant_map_ro) ) > { > gdprintk(XENLOG_WARNING, > "trying to write to read-only grant mapping\n"); >
Hi, At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote:> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): > > This is a host-specific, but deterministic, failure. > > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org>Hmm, Didn''t have to pull on that thread too hard to find it''s not tied to anything. The access_* arguments to hvm_hap_nested_page_fault() aren''t plumbed in on AMD: ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); so gating behaviour on them is not going to work so well. Sorry about that - I should definitely have caught this. (But Andres, did you test this, or any of your mm work, on AMD?) The attached patch ought to fix it. Smoke-tested but I won''t have good enough access to my test machines to check Windows installs before Thursday. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi, > > At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote: >> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): >> > This is a host-specific, but deterministic, failure. >> >> Improve handling of nested page faults >> >> Add checks for access type. Be less reliant on implicit semantics. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> Acked-by: Tim Deegan <tim@xen.org> >> Committed-by: Tim Deegan <tim@xen.org> > > Hmm, Didn''t have to pull on that thread too hard to find it''s not tied > to anything. The access_* arguments to hvm_hap_nested_page_fault() > aren''t plumbed in on AMD: > > ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); > > so gating behaviour on them is not going to work so well. Sorry about > that - I should definitely have caught this. (But Andres, did you test > this, or any of your mm work, on AMD?)Phew, thanks Tim. FWIW: Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> And, no, we haven''t tested on AMD, because we depend critically on the support that has been built into EPT for paging/mem_access. Andres> > The attached patch ought to fix it. Smoke-tested but I won''t have > good enough access to my test machines to check Windows installs before > Thursday. > > Cheers, > > Tim. >
Tim Deegan writes ("Re: [Xen-devel] [xen-unstable test] 11574: tolerable
FAIL"):> At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote:
> Hmm, Didn''t have to pull on that thread too hard to find
it''s not tied
> to anything.  The access_* arguments to hvm_hap_nested_page_fault()
> aren''t plumbed in on AMD:
> 
>     ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0);
> 
> so gating behaviour on them is not going to work so well.  Sorry about
> that - I should definitely have caught this.  (But Andres, did you test
> this, or any of your mm work, on AMD?)
In general it''s probably unrealistic to ask every submitter to test
every patch on two different systems...
> The attached patch ought to fix it.  Smoke-tested but I won''t have
> good enough access to my test machines to check Windows installs before
> Thursday. 
I''d be quite happy if this patch went into -unstable right away.  We
should be able to tell from the automatic tests whether it is awful
:-) and the current situation is rather poor.
Also when we have got rid of this host-specific failure I can push my
new test machinery branch which is intended to (mostly) prevent
host-specific failures being regarded as heisenbugs.
Thanks,
Ian.
At 17:44 +0000 on 24 Jan (1327427040), Ian Jackson wrote:> Tim Deegan writes ("Re: [Xen-devel] [xen-unstable test] 11574: tolerable FAIL"): > > At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote: > > Hmm, Didn''t have to pull on that thread too hard to find it''s not tied > > to anything. The access_* arguments to hvm_hap_nested_page_fault() > > aren''t plumbed in on AMD: > > > > ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); > > > > so gating behaviour on them is not going to work so well. Sorry about > > that - I should definitely have caught this. (But Andres, did you test > > this, or any of your mm work, on AMD?) > > In general it''s probably unrealistic to ask every submitter to test > every patch on two different systems...True.> > The attached patch ought to fix it. Smoke-tested but I won''t have > > good enough access to my test machines to check Windows installs before > > Thursday. > > I''d be quite happy if this patch went into -unstable right away. We > should be able to tell from the automatic tests whether it is awful > :-) and the current situation is rather poor.Done. Cheers, Tim.