flight 11574 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/11574/
Failures :-/ but no regressions.
Tests which are failing intermittently (not blocking):
test-amd64-i386-win 7 windows-install fail pass in 11561
test-i386-i386-win 7 windows-install fail pass in 11561
test-amd64-i386-xend-winxpsp3 7 windows-install fail in 11561 pass in 11574
test-amd64-amd64-xl-win 7 windows-install fail in 11561 pass in 11574
test-amd64-amd64-win 7 windows-install fail in 11561 pass in 11574
Regressions which are regarded as allowable (not blocking):
test-amd64-i386-win-vcpus1 7 windows-install fail like 11561
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
test-amd64-i386-rhel6hvm-intel 9 guest-start.2 fail never pass
test-amd64-i386-rhel6hvm-amd 9 guest-start.2 fail never pass
test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-win 13 guest-stop fail never pass
test-i386-i386-xl-win 13 guest-stop fail never pass
test-amd64-i386-xl-win-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass
test-i386-i386-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-win 16 leak-check/check fail never pass
test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-win 16 leak-check/check fail in 11561 never pass
test-i386-i386-win 16 leak-check/check fail in 11561 never pass
version targeted for testing:
xen 370924e204dc
baseline version:
xen 370924e204dc
jobs:
build-amd64 pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-amd64-i386-xl pass
test-i386-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel fail
test-amd64-i386-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-i386-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv pass
test-i386-i386-pv pass
test-amd64-amd64-xl-sedf pass
test-amd64-i386-win-vcpus1 fail
test-amd64-i386-xl-win-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-amd64-win fail
test-amd64-i386-win fail
test-i386-i386-win fail
test-amd64-amd64-xl-win fail
test-i386-i386-xl-win fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
test-i386-i386-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Published tested tree is already up to date.
xen.org writes ("[xen-unstable test] 11574: tolerable
FAIL"):> Tests which are failing intermittently (not blocking):
> test-amd64-i386-win 7 windows-install fail pass in
11561
This is a host-specific, but deterministic, failure. My bisector has
been working on it (the basis pass was in November so there has been a
lot of ground to cover and I had to make some new arrangements to be
able to run an ad-hoc bisection of this problem) and the results so
far are fingering changesets between 24367:537ceb11d51e (bad) and
24362:d35bedf334f2 (good).
Extract from hg log -v is below.
I have looked at the logs of one of these failures and there is
nothing of note. The screenshot shows the Windows screensaver, so I
guess neither the guest itself nor qemu have crashed. The likely
situation is either that the guest has somehow locked up and ceased to
make progress, or that the networking is nonfunctional.
I''m expecting the bisector to finger a specific changeset and will let
you know when it does, but this will take some more hours and maybe
only finish overnight or tomorrow.
Ian.
changeset: 24367:537ceb11d51e
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:31:49 2011 +0000
files: xen/arch/x86/hvm/hvm.c
description:
Improve handling of nested page faults
Add checks for access type. Be less reliant on implicit semantics.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset: 24366:534b2a15e669
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:10:32 2011 +0000
files: xen/arch/x86/mm/mem_sharing.c xen/arch/x86/mm/p2m.c
xen/include/public/mem_event.h
description:
x86/mm: Allow dummy responses on the mem_event ring.
Ring semantics require that for every request, a response be put. This
allows consumer to place a dummy response if need be.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset: 24365:c65d1a9769b4
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:10:32 2011 +0000
files: xen/arch/x86/mm/mem_event.c xen/arch/x86/mm/mem_sharing.c
xen/arch/x86/mm/p2m.c xen/include/asm-x86/mem_event.h
description:
x86/mm: Consume multiple mem event responses off the ring
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scanneel.ca>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset: 24364:0964341efd65
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:10:32 2011 +0000
files: xen/arch/x86/mm/mem_event.c
description:
x86/mm: Allow memevent responses to be signaled via the event channel
Don''t require a separate domctl to notify the memevent interface that
an event
has occured. This domctl can be taxing, particularly when you are scaling
events and paging to many domains across a single system. Instead, we use the
existing event channel to signal when we place something in the ring (as per
normal ring operation).
Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
changeset: 24363:1620291f0c4a
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:10:32 2011 +0000
files: xen/arch/ia64/vmx/vmx_init.c xen/arch/x86/hvm/hvm.c
xen/arch/x86/mm/mem_event.c xen/common/event_channel.c xen/include/xen/event.h
xen/include/xen/sched.h
description:
Create a generic callback mechanism for Xen-bound event channels
For event channels for which Xen is the consumer, there currently is
a single action. With this patch, we allow event channel creators to
specify a generic callback (or no callback). Because the expectation
is that there will be few callbacks, they are stored in a small table.
Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Committed-by: Tim Deegan <tim@xen.org>
Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable
FAIL"):> This is a host-specific, but deterministic, failure. My bisector has
> been working on it (the basis pass was in November so there has been a
> lot of ground to cover and I had to make some new arrangements to be
> able to run an ad-hoc bisection of this problem) and the results so
> far are fingering changesets between 24367:537ceb11d51e (bad) and
> 24362:d35bedf334f2 (good).
The bisector is now minded to blame this changeset:
changeset: 24367:537ceb11d51e
user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
date: Tue Dec 06 20:31:49 2011 +0000
files: xen/arch/x86/hvm/hvm.c
description:
Improve handling of nested page faults
Add checks for access type. Be less reliant on implicit semantics.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
It''s doing a couple more before-and-after repros to be sure, but this
looks like a plausible candidate. Full diff below.
I''m not an expert on this area. Perhaps someone could speculate what
Windows is doing that is triggering one of these new checks ?
Ian.
# HG changeset patch
# User Andres Lagar-Cavilla <andres@lagarcavilla.org>
# Date 1323203509 0
# Node ID 537ceb11d51ef60cd4abffd2f54de0ae0ca50008
# Parent 534b2a15e6695dfd8866c0ff626b002cbf57991a
Improve handling of nested page faults
Add checks for access type. Be less reliant on implicit semantics.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
diff -r 534b2a15e669 -r 537ceb11d51e xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:10:32 2011 +0000
+++ b/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:31:49 2011 +0000
@@ -1288,7 +1288,8 @@ int hvm_hap_nested_page_fault(unsigned l
* If this GFN is emulated MMIO or marked as read-only, pass the fault
* to the mmio handler.
*/
- if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) )
+ if ( (p2mt == p2m_mmio_dm) ||
+ (access_w && (p2mt == p2m_ram_ro)) )
{
if ( !handle_mmio() )
hvm_inject_exception(TRAP_gp_fault, 0, 0);
@@ -1302,7 +1303,7 @@ int hvm_hap_nested_page_fault(unsigned l
p2m_mem_paging_populate(v->domain, gfn);
/* Mem sharing: unshare the page and try again */
- if ( p2mt == p2m_ram_shared )
+ if ( access_w && (p2mt == p2m_ram_shared) )
{
ASSERT(!p2m_is_nestedp2m(p2m));
mem_sharing_unshare_page(p2m->domain, gfn, 0);
@@ -1319,14 +1320,17 @@ int hvm_hap_nested_page_fault(unsigned l
* a large page, we do not change other pages type within that large
* page.
*/
- paging_mark_dirty(v->domain, mfn_x(mfn));
- p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+ if ( access_w )
+ {
+ paging_mark_dirty(v->domain, mfn_x(mfn));
+ p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+ }
rc = 1;
goto out_put_gfn;
}
/* Shouldn''t happen: Maybe the guest was writing to a r/o grant
mapping? */
- if ( p2mt == p2m_grant_map_ro )
+ if ( access_w && (p2mt == p2m_grant_map_ro) )
{
gdprintk(XENLOG_WARNING,
"trying to write to read-only grant mapping\n");
> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): >> This is a host-specific, but deterministic, failure. My bisector has >> been working on it (the basis pass was in November so there has been a >> lot of ground to cover and I had to make some new arrangements to be >> able to run an ad-hoc bisection of this problem) and the results so >> far are fingering changesets between 24367:537ceb11d51e (bad) and >> 24362:d35bedf334f2 (good). > > The bisector is now minded to blame this changeset: > > changeset: 24367:537ceb11d51e > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > date: Tue Dec 06 20:31:49 2011 +0000 > files: xen/arch/x86/hvm/hvm.c > description: > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org> > > It''s doing a couple more before-and-after repros to be sure, but this > looks like a plausible candidate. Full diff below.Thanks Ian, just to be sure, none of the following conditions apply on your test: - hvm uses memory sharing - hvm runs a backend pv driver - hvm uses populate-on-demand. If all the above apply, I can quickly put together a micro-patch to assert that the deal-breaker is> - if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) ) > + if ( (p2mt == p2m_mmio_dm) || > + (access_w && (p2mt == p2m_ram_ro)) ) >Would you be able to test that? Thanks again Andres> I''m not an expert on this area. Perhaps someone could speculate what > Windows is doing that is triggering one of these new checks ? > > Ian. > > # HG changeset patch > # User Andres Lagar-Cavilla <andres@lagarcavilla.org> > # Date 1323203509 0 > # Node ID 537ceb11d51ef60cd4abffd2f54de0ae0ca50008 > # Parent 534b2a15e6695dfd8866c0ff626b002cbf57991a > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org> > > diff -r 534b2a15e669 -r 537ceb11d51e xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:10:32 2011 +0000 > +++ b/xen/arch/x86/hvm/hvm.c Tue Dec 06 20:31:49 2011 +0000 > @@ -1288,7 +1288,8 @@ int hvm_hap_nested_page_fault(unsigned l > * If this GFN is emulated MMIO or marked as read-only, pass the > fault > * to the mmio handler. > */ > - if ( (p2mt == p2m_mmio_dm) || (p2mt == p2m_ram_ro) ) > + if ( (p2mt == p2m_mmio_dm) || > + (access_w && (p2mt == p2m_ram_ro)) ) > { > if ( !handle_mmio() ) > hvm_inject_exception(TRAP_gp_fault, 0, 0); > @@ -1302,7 +1303,7 @@ int hvm_hap_nested_page_fault(unsigned l > p2m_mem_paging_populate(v->domain, gfn); > > /* Mem sharing: unshare the page and try again */ > - if ( p2mt == p2m_ram_shared ) > + if ( access_w && (p2mt == p2m_ram_shared) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > mem_sharing_unshare_page(p2m->domain, gfn, 0); > @@ -1319,14 +1320,17 @@ int hvm_hap_nested_page_fault(unsigned l > * a large page, we do not change other pages type within that > large > * page. > */ > - paging_mark_dirty(v->domain, mfn_x(mfn)); > - p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw); > + if ( access_w ) > + { > + paging_mark_dirty(v->domain, mfn_x(mfn)); > + p2m_change_type(v->domain, gfn, p2m_ram_logdirty, > p2m_ram_rw); > + } > rc = 1; > goto out_put_gfn; > } > > /* Shouldn''t happen: Maybe the guest was writing to a r/o grant > mapping? */ > - if ( p2mt == p2m_grant_map_ro ) > + if ( access_w && (p2mt == p2m_grant_map_ro) ) > { > gdprintk(XENLOG_WARNING, > "trying to write to read-only grant mapping\n"); >
Hi, At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote:> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): > > This is a host-specific, but deterministic, failure. > > Improve handling of nested page faults > > Add checks for access type. Be less reliant on implicit semantics. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Acked-by: Tim Deegan <tim@xen.org> > Committed-by: Tim Deegan <tim@xen.org>Hmm, Didn''t have to pull on that thread too hard to find it''s not tied to anything. The access_* arguments to hvm_hap_nested_page_fault() aren''t plumbed in on AMD: ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); so gating behaviour on them is not going to work so well. Sorry about that - I should definitely have caught this. (But Andres, did you test this, or any of your mm work, on AMD?) The attached patch ought to fix it. Smoke-tested but I won''t have good enough access to my test machines to check Windows installs before Thursday. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi, > > At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote: >> Ian Jackson writes ("Re: [xen-unstable test] 11574: tolerable FAIL"): >> > This is a host-specific, but deterministic, failure. >> >> Improve handling of nested page faults >> >> Add checks for access type. Be less reliant on implicit semantics. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> Acked-by: Tim Deegan <tim@xen.org> >> Committed-by: Tim Deegan <tim@xen.org> > > Hmm, Didn''t have to pull on that thread too hard to find it''s not tied > to anything. The access_* arguments to hvm_hap_nested_page_fault() > aren''t plumbed in on AMD: > > ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); > > so gating behaviour on them is not going to work so well. Sorry about > that - I should definitely have caught this. (But Andres, did you test > this, or any of your mm work, on AMD?)Phew, thanks Tim. FWIW: Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> And, no, we haven''t tested on AMD, because we depend critically on the support that has been built into EPT for paging/mem_access. Andres> > The attached patch ought to fix it. Smoke-tested but I won''t have > good enough access to my test machines to check Windows installs before > Thursday. > > Cheers, > > Tim. >
Tim Deegan writes ("Re: [Xen-devel] [xen-unstable test] 11574: tolerable
FAIL"):> At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote:
> Hmm, Didn''t have to pull on that thread too hard to find
it''s not tied
> to anything. The access_* arguments to hvm_hap_nested_page_fault()
> aren''t plumbed in on AMD:
>
> ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0);
>
> so gating behaviour on them is not going to work so well. Sorry about
> that - I should definitely have caught this. (But Andres, did you test
> this, or any of your mm work, on AMD?)
In general it''s probably unrealistic to ask every submitter to test
every patch on two different systems...
> The attached patch ought to fix it. Smoke-tested but I won''t have
> good enough access to my test machines to check Windows installs before
> Thursday.
I''d be quite happy if this patch went into -unstable right away. We
should be able to tell from the automatic tests whether it is awful
:-) and the current situation is rather poor.
Also when we have got rid of this host-specific failure I can push my
new test machinery branch which is intended to (mostly) prevent
host-specific failures being regarded as heisenbugs.
Thanks,
Ian.
At 17:44 +0000 on 24 Jan (1327427040), Ian Jackson wrote:> Tim Deegan writes ("Re: [Xen-devel] [xen-unstable test] 11574: tolerable FAIL"): > > At 16:00 +0000 on 24 Jan (1327420844), Ian Jackson wrote: > > Hmm, Didn''t have to pull on that thread too hard to find it''s not tied > > to anything. The access_* arguments to hvm_hap_nested_page_fault() > > aren''t plumbed in on AMD: > > > > ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); > > > > so gating behaviour on them is not going to work so well. Sorry about > > that - I should definitely have caught this. (But Andres, did you test > > this, or any of your mm work, on AMD?) > > In general it''s probably unrealistic to ask every submitter to test > every patch on two different systems...True.> > The attached patch ought to fix it. Smoke-tested but I won''t have > > good enough access to my test machines to check Windows installs before > > Thursday. > > I''d be quite happy if this patch went into -unstable right away. We > should be able to tell from the automatic tests whether it is awful > :-) and the current situation is rather poor.Done. Cheers, Tim.