flight 19308 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs.
19208
test-amd64-i386-qemut-rhel6hvm-intel 11 leak-check/check fail in 19288 REGR.
vs. 19208
test-amd64-amd64-xl-winxpsp3 8 guest-saverestore fail in 19288 REGR. vs. 19208
test-amd64-i386-xl-qemut-win7-amd64 12 guest-localmigrate/x10 fail in 19288
REGR. vs. 19208
Tests which are failing intermittently (not blocking):
test-amd64-i386-qemut-rhel6hvm-intel 7 redhat-install fail pass in 19288
test-amd64-amd64-xl-winxpsp3 7 windows-install fail pass in 19288
test-amd64-i386-xl-qemut-win7-amd64 7 windows-install fail pass in 19288
test-amd64-i386-rhel6hvm-intel 7 redhat-install fail in 19288 pass in 19308
test-amd64-amd64-xl-sedf 14 guest-localmigrate/x10 fail in 19288 pass in 19308
test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail in 19288 pass in 19308
test-amd64-i386-xl-win7-amd64 11 guest-localmigrate.2 fail in 19288 pass in
19308
test-amd64-amd64-xl-win7-amd64 7 windows-install fail in 19288 pass in 19308
test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail in 19288 pass in
19308
Tests which did not succeed, but are not blocking:
test-armhf-armhf-xl 1 xen-build-check(1) blocked n/a
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
build-armhf-pvops 4 kernel-build fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass
version targeted for testing:
xen 593470233ff38385df9dcf5690cc58c7a4fb290d
baseline version:
xen cadbe2f9e768585fad52156be2433d49ec9feaf1
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli <dario.faggioli@citrix.com>
Fabio Fantoni <fabio.fantoni@m2r.biz>
George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Juergen Gross <juergen.gross@ts.fujitsu.com>
Keir Fraser <keir@xen.org>
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Matthew Daley <mattjd@gmail.com>
Roger Pau Monné <roger.pau@citrix.com>
Samuel Thibault <samuel.thibault@ens-lyon.org>
Tim Deegan <tim@xen.org>
Wei Liu <wei.liu2@citrix.com>
------------------------------------------------------------
jobs:
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-armhf-pvops fail
build-i386-pvops pass
test-amd64-amd64-xl pass
test-armhf-armhf-xl blocked
test-amd64-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-i386-qemut-rhel6hvm-amd fail
test-amd64-i386-qemuu-rhel6hvm-amd fail
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel fail
test-amd64-i386-qemut-rhel6hvm-intel fail
test-amd64-i386-qemuu-rhel6hvm-intel fail
test-amd64-i386-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv pass
test-amd64-amd64-xl-sedf pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-i386-xend-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 368 lines long.)
--===============3709391973356136336=Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
--===============3709391973356136336==--
On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:> flight 19308 xen-unstable real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208These are due to /var/run/xen-hotplug/block getting leaked> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208These are: libxl: error: libxl_device.c:894:device_backend_callback: unable to add device with path /local/domain/0/backend/vbd/9/5632 libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices /var/log/xen/xenhotplug.log contains: xenstore-read: couldn''t read path backend/vbd/9/5632/node For both of these I''m suspicious of: 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file and to a lesser extent: a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid It doesn''t look like the bisector is looking at this, or else I''m reading osstest''s resource plan wrongly
On 15/09/13 14:50, Ian Campbell wrote:> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: >> flight 19308 xen-unstable real [real] >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > These are due to /var/run/xen-hotplug/block getting leaked > >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > These are: > libxl: error: libxl_device.c:894:device_backend_callback: unable > to add device with path /local/domain/0/backend/vbd/9/5632 > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > /var/log/xen/xenhotplug.log contains: > xenstore-read: couldn't read path backend/vbd/9/5632/node > > For both of these I'm suspicious of: > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format fileHello, I've tracked this down to libxl writing a wrong physical-device xenstore node when using regular files. When using block devices libxl can write the physical-device because it can be fetched without requiring the execution of the block script, but with regular files it is not true, we must first execute the block script in order to mount the regular file into a loop device and then fetch the physical-device from the loop device to which the image has been mounted. Following patch solves the issue for me. 8<------------------------------------------------------------------- From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Mon, 16 Sep 2013 09:39:05 +0200 Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in libxl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit libxl used to write the physical-device xenstore node needed by the phy backend type, because the phy backend type could only be used with block devices. If libxl allows the backend type phy to be used with regular files, it can no longer write physical-device because the hotplug script has to be executed first in order to mount the regular file into a loop device and then write the physical-device of the loop device used. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- tools/libxl/libxl.c | 15 --------------- 1 files changed, 0 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 0879f23..326a378 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, libxl__xen_script_dir_path()); flexarray_append_pair(back, "script", script); - /* If the user did not supply a block script then we - * write the physical-device node ourselves. - * - * If the user did supply a script then that script is - * responsible for this since the block device may not - * exist yet. - */ - if (!disk->script && - disk->backend_domid == LIBXL_TOOLSTACK_DOMID) { - int major, minor; - libxl__device_physdisk_major_minor(dev, &major, &minor); - flexarray_append_pair(back, "physical-device", - libxl__sprintf(gc, "%x:%x", major, minor)); - } - assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD); break; -- 1.7.7.5 (Apple Git-26) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:> On 15/09/13 14:50, Ian Campbell wrote: > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: > >> flight 19308 xen-unstable real [real] > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > >> > >> Regressions :-( > >> > >> Tests which did not succeed and are blocking, > >> including tests which could not be run: > >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > > > These are due to /var/run/xen-hotplug/block getting leaked > >The error message in XenStore shows blkback tries to get hold of the block device 0:0 but there''s no such device entry in system.> >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > > > These are: > > libxl: error: libxl_device.c:894:device_backend_callback: unable > > to add device with path /local/domain/0/backend/vbd/9/5632 > > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > > > /var/log/xen/xenhotplug.log contains: > > xenstore-read: couldn''t read path backend/vbd/9/5632/node > > > > For both of these I''m suspicious of: > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file > > Hello, > > I''ve tracked this down to libxl writing a wrong physical-device > xenstore node when using regular files. When using block devices libxl > can write the physical-device because it can be fetched without > requiring the execution of the block script, but with regular files it > is not true, we must first execute the block script in order to mount > the regular file into a loop device and then fetch the physical-device > from the loop device to which the image has been mounted. Following > patch solves the issue for me. >Yes, that''s the in question I think. That code snippet was introduced in: commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 Author: Ian Campbell <Ian.Campbell@citrix.com> Date: Tue Aug 7 14:26:29 2012 +0100 libxl: write physical-device node if user did not supply a block script This reverts one of the intentional changes from 25733:353bc0801b11. That change exposed an issue with the xl migration protocol, which although safe triggers the hotplug scripts device sharing logic. For 4.2 we disable this logic by writing the physical-device xenstore node ourselves if a user did not supply a script. If the user did supply a script then we continue to rely on it to write the physical-device node (not least because the script may create the device and therefore it is not available before we run the script). This means that to support localhost migration a block hotplug script needs to be robust against adding a device twice and should not deactivate the device until it has been removed twice. This should be revisited for 4.3. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com> And in the commit message it says this behavior should be revisited. Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) things look more complicated. One interesting snippet in the commit message is: - libxl should not write the "physical-device" node. This is the responsibility of the block script. Writing the "physical-device" node in libxl basically completely short-cuts the standard block hotplug script which uses "physical-device" to know if it has run already or not. That makes me believe the following fix is the correct thing to do in long term. I have to admit that I cannot fully consume the commit message of 25733 in one day so unless you (Ian) can confirm Roger''s fix will not cause further regression otherwise I would suggest reverting my change at the moment. Wei.> 8<------------------------------------------------------------------- > >From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001 > From: Roger Pau Monne <roger.pau@citrix.com> > Date: Mon, 16 Sep 2013 09:39:05 +0200 > Subject: [PATCH] libxl: don''t write physical-device vbd xenstore node in > libxl > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > libxl used to write the physical-device xenstore node needed by the > phy backend type, because the phy backend type could only be used with > block devices. If libxl allows the backend type phy to be used with > regular files, it can no longer write physical-device because the > hotplug script has to be executed first in order to mount the regular > file into a loop device and then write the physical-device of the loop > device used. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > tools/libxl/libxl.c | 15 --------------- > 1 files changed, 0 insertions(+), 15 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 0879f23..326a378 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, > libxl__xen_script_dir_path()); > flexarray_append_pair(back, "script", script); > > - /* If the user did not supply a block script then we > - * write the physical-device node ourselves. > - * > - * If the user did supply a script then that script is > - * responsible for this since the block device may not > - * exist yet. > - */ > - if (!disk->script && > - disk->backend_domid == LIBXL_TOOLSTACK_DOMID) { > - int major, minor; > - libxl__device_physdisk_major_minor(dev, &major, &minor); > - flexarray_append_pair(back, "physical-device", > - libxl__sprintf(gc, "%x:%x", major, minor)); > - } > - > assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD); > break; > > -- > 1.7.7.5 (Apple Git-26) > >
On 16/09/13 11:55, Wei Liu wrote:> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote: >> On 15/09/13 14:50, Ian Campbell wrote: >>> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: >>>> flight 19308 xen-unstable real [real] >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, >>>> including tests which could not be run: >>>> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>> >>> These are due to /var/run/xen-hotplug/block getting leaked >>> > > The error message in XenStore shows blkback tries to get hold of the > block device 0:0 but there''s no such device entry in system. > >>>> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>>> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>>> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>> >>> These are: >>> libxl: error: libxl_device.c:894:device_backend_callback: unable >>> to add device with path /local/domain/0/backend/vbd/9/5632 >>> libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices >>> >>> /var/log/xen/xenhotplug.log contains: >>> xenstore-read: couldn''t read path backend/vbd/9/5632/node >>> >>> For both of these I''m suspicious of: >>> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file >> >> Hello, >> >> I''ve tracked this down to libxl writing a wrong physical-device >> xenstore node when using regular files. When using block devices libxl >> can write the physical-device because it can be fetched without >> requiring the execution of the block script, but with regular files it >> is not true, we must first execute the block script in order to mount >> the regular file into a loop device and then fetch the physical-device >> from the loop device to which the image has been mounted. Following >> patch solves the issue for me. >> > > Yes, that''s the in question I think. That code snippet was introduced in: > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > Author: Ian Campbell <Ian.Campbell@citrix.com> > Date: Tue Aug 7 14:26:29 2012 +0100 > > libxl: write physical-device node if user did not supply a block script > > This reverts one of the intentional changes from 25733:353bc0801b11. > That change exposed an issue with the xl migration protocol, which > although safe triggers the hotplug scripts device sharing logic. > > For 4.2 we disable this logic by writing the physical-device xenstore > node ourselves if a user did not supply a script. If the user did > supply a script then we continue to rely on it to write the > physical-device node (not least because the script may create the > device and therefore it is not available before we run the script). > > This means that to support localhost migration a block hotplug script > needs to be robust against adding a device twice and should not > deactivate the device until it has been removed twice. > > This should be revisited for 4.3. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > And in the commit message it says this behavior should be revisited. > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > things look more complicated. One interesting snippet in the commit > message is: > > - libxl should not write the "physical-device" node. This is the > responsibility of the block script. Writing the "physical-device" > node in libxl basically completely short-cuts the standard block > hotplug script which uses "physical-device" to know if it has run > already or not. > > That makes me believe the following fix is the correct thing to do in > long term. > > I have to admit that I cannot fully consume the commit message of 25733 > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > regression otherwise I would suggest reverting my change at the moment.My fix deals with one part of the problem, but will fail on local migrate (block script will refuse to attach the same device twice). This is indeed a tricky issue, and I cannot see an easy way to deal with it. The proper way to fix this would be to unplug the devices from the suspended domain before creating the new domain, but I''m sure this is not trivial (this would also imply reattaching the devices to the original domain if migration fails).
Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 19308:
regressions - FAIL"):> For both of these I''m suspicious of:
> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>
> and to a lesser extent:
> a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid
Yes.
> It doesn''t look like the bisector is looking at this, or else
I''m
> reading osstest''s resource plan wrongly
The log volume had filled up with git caches. I''m pruning them.
Really this should be automatic but really there should be one cache,
not one per host.
Ian.
On Mon, 2013-09-16 at 10:55 +0100, Wei Liu wrote:> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote: > > On 15/09/13 14:50, Ian Campbell wrote: > > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: > > >> flight 19308 xen-unstable real [real] > > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > > >> > > >> Regressions :-( > > >> > > >> Tests which did not succeed and are blocking, > > >> including tests which could not be run: > > >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > > > > > These are due to /var/run/xen-hotplug/block getting leaked > > > > > The error message in XenStore shows blkback tries to get hold of the > block device 0:0 but there's no such device entry in system. > > > >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > > > > > These are: > > > libxl: error: libxl_device.c:894:device_backend_callback: unable > > > to add device with path /local/domain/0/backend/vbd/9/5632 > > > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > > > > > /var/log/xen/xenhotplug.log contains: > > > xenstore-read: couldn't read path backend/vbd/9/5632/node > > > > > > For both of these I'm suspicious of: > > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file > > > > Hello, > > > > I've tracked this down to libxl writing a wrong physical-device > > xenstore node when using regular files. When using block devices libxl > > can write the physical-device because it can be fetched without > > requiring the execution of the block script, but with regular files it > > is not true, we must first execute the block script in order to mount > > the regular file into a loop device and then fetch the physical-device > > from the loop device to which the image has been mounted. Following > > patch solves the issue for me. > > > > Yes, that's the in question I think. That code snippet was introduced in: > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > Author: Ian Campbell <Ian.Campbell@citrix.com> > Date: Tue Aug 7 14:26:29 2012 +0100 > > libxl: write physical-device node if user did not supply a block script > > This reverts one of the intentional changes from 25733:353bc0801b11. > That change exposed an issue with the xl migration protocol, which > although safe triggers the hotplug scripts device sharing logic. > > For 4.2 we disable this logic by writing the physical-device xenstore > node ourselves if a user did not supply a script. If the user did > supply a script then we continue to rely on it to write the > physical-device node (not least because the script may create the > device and therefore it is not available before we run the script). > > This means that to support localhost migration a block hotplug script > needs to be robust against adding a device twice and should not > deactivate the device until it has been removed twice. > > This should be revisited for 4.3. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > And in the commit message it says this behavior should be revisited.Which never happened :-( I don't remember exactly but I think the real fix is a reworking of the sequencing of block device attach/detach vs the migration stop and copy phase, not a simple tweak IIRC.> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > things look more complicated. One interesting snippet in the commit > message is: > > - libxl should not write the "physical-device" node. This is the > responsibility of the block script. Writing the "physical-device" > node in libxl basically completely short-cuts the standard block > hotplug script which uses "physical-device" to know if it has run > already or not. > > That makes me believe the following fix is the correct thing to do in > long term. > > I have to admit that I cannot fully consume the commit message of 25733 > in one day so unless you (Ian) can confirm Roger's fix will not cause further > regression otherwise I would suggest reverting my change at the moment.Can you test some lifecycle operations, in particular localhost migrations with both phy:// and file:// devices to see if it fixes it? If not then we can revert. Perhaps rather than removing that block entirely it should be conditional on S_ISBLK? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote: [...]> > > I''ve tracked this down to libxl writing a wrong physical-device > > > xenstore node when using regular files. When using block devices libxl > > > can write the physical-device because it can be fetched without > > > requiring the execution of the block script, but with regular files it > > > is not true, we must first execute the block script in order to mount > > > the regular file into a loop device and then fetch the physical-device > > > from the loop device to which the image has been mounted. Following > > > patch solves the issue for me. > > > > > > > Yes, that''s the in question I think. That code snippet was introduced in: > > > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > > Author: Ian Campbell <Ian.Campbell@citrix.com> > > Date: Tue Aug 7 14:26:29 2012 +0100 > > > > libxl: write physical-device node if user did not supply a block script > > > > This reverts one of the intentional changes from 25733:353bc0801b11. > > That change exposed an issue with the xl migration protocol, which > > although safe triggers the hotplug scripts device sharing logic. > > > > For 4.2 we disable this logic by writing the physical-device xenstore > > node ourselves if a user did not supply a script. If the user did > > supply a script then we continue to rely on it to write the > > physical-device node (not least because the script may create the > > device and therefore it is not available before we run the script). > > > > This means that to support localhost migration a block hotplug script > > needs to be robust against adding a device twice and should not > > deactivate the device until it has been removed twice. > > > > This should be revisited for 4.3. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > > > And in the commit message it says this behavior should be revisited. > > Which never happened :-( > > I don''t remember exactly but I think the real fix is a reworking of the > sequencing of block device attach/detach vs the migration stop and copy > phase, not a simple tweak IIRC. > > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > > things look more complicated. One interesting snippet in the commit > > message is: > > > > - libxl should not write the "physical-device" node. This is the > > responsibility of the block script. Writing the "physical-device" > > node in libxl basically completely short-cuts the standard block > > hotplug script which uses "physical-device" to know if it has run > > already or not. > > > > That makes me believe the following fix is the correct thing to do in > > long term. > > > > I have to admit that I cannot fully consume the commit message of 25733 > > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > > regression otherwise I would suggest reverting my change at the moment. > > Can you test some lifecycle operations, in particular localhost > migrations with both phy:// and file:// devices to see if it fixes it? > If not then we can revert. >Unfortunately with Roger''s patch applied local migration for raw format file disk doesn''t work. xc: detail: Save exit of domid 69 with rc=0 libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1 libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0, which is mounted in a guest domain, and so cannot be mounted now. libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1 libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. migration target: Domain creation failed (code -3). libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3 Migration failed, resuming at sender.> Perhaps rather than removing that block entirely it should be > conditional on S_ISBLK? >With the conditional on S_ISBLK, raw format file mounted to loopdev, local migration still breaks with above error. So for now please revert that change. Wei.> Ian.
On Mon, 2013-09-16 at 14:25 +0100, Wei Liu wrote:> On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote: > [...] > > > > I''ve tracked this down to libxl writing a wrong physical-device > > > > xenstore node when using regular files. When using block devices libxl > > > > can write the physical-device because it can be fetched without > > > > requiring the execution of the block script, but with regular files it > > > > is not true, we must first execute the block script in order to mount > > > > the regular file into a loop device and then fetch the physical-device > > > > from the loop device to which the image has been mounted. Following > > > > patch solves the issue for me. > > > > > > > > > > Yes, that''s the in question I think. That code snippet was introduced in: > > > > > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > > > Author: Ian Campbell <Ian.Campbell@citrix.com> > > > Date: Tue Aug 7 14:26:29 2012 +0100 > > > > > > libxl: write physical-device node if user did not supply a block script > > > > > > This reverts one of the intentional changes from 25733:353bc0801b11. > > > That change exposed an issue with the xl migration protocol, which > > > although safe triggers the hotplug scripts device sharing logic. > > > > > > For 4.2 we disable this logic by writing the physical-device xenstore > > > node ourselves if a user did not supply a script. If the user did > > > supply a script then we continue to rely on it to write the > > > physical-device node (not least because the script may create the > > > device and therefore it is not available before we run the script). > > > > > > This means that to support localhost migration a block hotplug script > > > needs to be robust against adding a device twice and should not > > > deactivate the device until it has been removed twice. > > > > > > This should be revisited for 4.3. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > > > > > And in the commit message it says this behavior should be revisited. > > > > Which never happened :-( > > > > I don''t remember exactly but I think the real fix is a reworking of the > > sequencing of block device attach/detach vs the migration stop and copy > > phase, not a simple tweak IIRC. > > > > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > > > things look more complicated. One interesting snippet in the commit > > > message is: > > > > > > - libxl should not write the "physical-device" node. This is the > > > responsibility of the block script. Writing the "physical-device" > > > node in libxl basically completely short-cuts the standard block > > > hotplug script which uses "physical-device" to know if it has run > > > already or not. > > > > > > That makes me believe the following fix is the correct thing to do in > > > long term. > > > > > > I have to admit that I cannot fully consume the commit message of 25733 > > > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > > > regression otherwise I would suggest reverting my change at the moment. > > > > Can you test some lifecycle operations, in particular localhost > > migrations with both phy:// and file:// devices to see if it fixes it? > > If not then we can revert. > > > > Unfortunately with Roger''s patch applied local migration for raw format > file disk doesn''t work. > > xc: detail: Save exit of domid 69 with rc=0 > libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1 > libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0, > which is mounted in a guest domain, > and so cannot be mounted now. > libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices > libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1 > libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. > migration target: Domain creation failed (code -3). > libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream > libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3 > Migration failed, resuming at sender. > > > Perhaps rather than removing that block entirely it should be > > conditional on S_ISBLK? > > > > With the conditional on S_ISBLK, raw format file mounted to loopdev, > local migration still breaks with above error. > > So for now please revert that change.Done. Ian.