flight 19308 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 test-amd64-i386-qemut-rhel6hvm-intel 11 leak-check/check fail in 19288 REGR. vs. 19208 test-amd64-amd64-xl-winxpsp3 8 guest-saverestore fail in 19288 REGR. vs. 19208 test-amd64-i386-xl-qemut-win7-amd64 12 guest-localmigrate/x10 fail in 19288 REGR. vs. 19208 Tests which are failing intermittently (not blocking): test-amd64-i386-qemut-rhel6hvm-intel 7 redhat-install fail pass in 19288 test-amd64-amd64-xl-winxpsp3 7 windows-install fail pass in 19288 test-amd64-i386-xl-qemut-win7-amd64 7 windows-install fail pass in 19288 test-amd64-i386-rhel6hvm-intel 7 redhat-install fail in 19288 pass in 19308 test-amd64-amd64-xl-sedf 14 guest-localmigrate/x10 fail in 19288 pass in 19308 test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail in 19288 pass in 19308 test-amd64-i386-xl-win7-amd64 11 guest-localmigrate.2 fail in 19288 pass in 19308 test-amd64-amd64-xl-win7-amd64 7 windows-install fail in 19288 pass in 19308 test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail in 19288 pass in 19308 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl 1 xen-build-check(1) blocked n/a test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass build-armhf-pvops 4 kernel-build fail never pass test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass version targeted for testing: xen 593470233ff38385df9dcf5690cc58c7a4fb290d baseline version: xen cadbe2f9e768585fad52156be2433d49ec9feaf1 ------------------------------------------------------------ People who touched revisions under test: Andrew Cooper <andrew.cooper3@citrix.com> Dario Faggioli <dario.faggioli@citrix.com> Fabio Fantoni <fabio.fantoni@m2r.biz> George Dunlap <george.dunlap@eu.citrix.com> Ian Campbell <ian.campbell@citrix.com> Ian Jackson <ian.jackson@eu.citrix.com> Jan Beulich <jbeulich@suse.com> Juergen Gross <juergen.gross@ts.fujitsu.com> Keir Fraser <keir@xen.org> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Matthew Daley <mattjd@gmail.com> Roger Pau Monné <roger.pau@citrix.com> Samuel Thibault <samuel.thibault@ens-lyon.org> Tim Deegan <tim@xen.org> Wei Liu <wei.liu2@citrix.com> ------------------------------------------------------------ jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvops pass build-armhf-pvops fail build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl blocked test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd fail test-amd64-i386-qemut-rhel6hvm-amd fail test-amd64-i386-qemuu-rhel6hvm-amd fail test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64 fail test-amd64-i386-xl-credit2 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel fail test-amd64-i386-qemut-rhel6hvm-intel fail test-amd64-i386-qemuu-rhel6hvm-intel fail test-amd64-i386-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-xl-sedf-pin pass test-amd64-amd64-pv pass test-amd64-i386-pv pass test-amd64-amd64-xl-sedf pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail test-amd64-i386-xl-winxpsp3-vcpus1 fail test-amd64-i386-xend-qemut-winxpsp3 fail test-amd64-amd64-xl-qemut-winxpsp3 fail test-amd64-amd64-xl-qemuu-winxpsp3 fail test-amd64-i386-xend-winxpsp3 fail test-amd64-amd64-xl-winxpsp3 fail ------------------------------------------------------------ sg-report-flight on woking.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 368 lines long.) --===============3709391973356136336=Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3709391973356136336==--
On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:> flight 19308 xen-unstable real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208These are due to /var/run/xen-hotplug/block getting leaked> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208These are: libxl: error: libxl_device.c:894:device_backend_callback: unable to add device with path /local/domain/0/backend/vbd/9/5632 libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices /var/log/xen/xenhotplug.log contains: xenstore-read: couldn''t read path backend/vbd/9/5632/node For both of these I''m suspicious of: 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file and to a lesser extent: a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid It doesn''t look like the bisector is looking at this, or else I''m reading osstest''s resource plan wrongly
On 15/09/13 14:50, Ian Campbell wrote:> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: >> flight 19308 xen-unstable real [real] >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > These are due to /var/run/xen-hotplug/block getting leaked > >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > These are: > libxl: error: libxl_device.c:894:device_backend_callback: unable > to add device with path /local/domain/0/backend/vbd/9/5632 > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > /var/log/xen/xenhotplug.log contains: > xenstore-read: couldn't read path backend/vbd/9/5632/node > > For both of these I'm suspicious of: > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format fileHello, I've tracked this down to libxl writing a wrong physical-device xenstore node when using regular files. When using block devices libxl can write the physical-device because it can be fetched without requiring the execution of the block script, but with regular files it is not true, we must first execute the block script in order to mount the regular file into a loop device and then fetch the physical-device from the loop device to which the image has been mounted. Following patch solves the issue for me. 8<------------------------------------------------------------------- From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Mon, 16 Sep 2013 09:39:05 +0200 Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in libxl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit libxl used to write the physical-device xenstore node needed by the phy backend type, because the phy backend type could only be used with block devices. If libxl allows the backend type phy to be used with regular files, it can no longer write physical-device because the hotplug script has to be executed first in order to mount the regular file into a loop device and then write the physical-device of the loop device used. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- tools/libxl/libxl.c | 15 --------------- 1 files changed, 0 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 0879f23..326a378 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, libxl__xen_script_dir_path()); flexarray_append_pair(back, "script", script); - /* If the user did not supply a block script then we - * write the physical-device node ourselves. - * - * If the user did supply a script then that script is - * responsible for this since the block device may not - * exist yet. - */ - if (!disk->script && - disk->backend_domid == LIBXL_TOOLSTACK_DOMID) { - int major, minor; - libxl__device_physdisk_major_minor(dev, &major, &minor); - flexarray_append_pair(back, "physical-device", - libxl__sprintf(gc, "%x:%x", major, minor)); - } - assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD); break; -- 1.7.7.5 (Apple Git-26) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:> On 15/09/13 14:50, Ian Campbell wrote: > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: > >> flight 19308 xen-unstable real [real] > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > >> > >> Regressions :-( > >> > >> Tests which did not succeed and are blocking, > >> including tests which could not be run: > >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > > > These are due to /var/run/xen-hotplug/block getting leaked > >The error message in XenStore shows blkback tries to get hold of the block device 0:0 but there''s no such device entry in system.> >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > > > These are: > > libxl: error: libxl_device.c:894:device_backend_callback: unable > > to add device with path /local/domain/0/backend/vbd/9/5632 > > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > > > /var/log/xen/xenhotplug.log contains: > > xenstore-read: couldn''t read path backend/vbd/9/5632/node > > > > For both of these I''m suspicious of: > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file > > Hello, > > I''ve tracked this down to libxl writing a wrong physical-device > xenstore node when using regular files. When using block devices libxl > can write the physical-device because it can be fetched without > requiring the execution of the block script, but with regular files it > is not true, we must first execute the block script in order to mount > the regular file into a loop device and then fetch the physical-device > from the loop device to which the image has been mounted. Following > patch solves the issue for me. >Yes, that''s the in question I think. That code snippet was introduced in: commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 Author: Ian Campbell <Ian.Campbell@citrix.com> Date: Tue Aug 7 14:26:29 2012 +0100 libxl: write physical-device node if user did not supply a block script This reverts one of the intentional changes from 25733:353bc0801b11. That change exposed an issue with the xl migration protocol, which although safe triggers the hotplug scripts device sharing logic. For 4.2 we disable this logic by writing the physical-device xenstore node ourselves if a user did not supply a script. If the user did supply a script then we continue to rely on it to write the physical-device node (not least because the script may create the device and therefore it is not available before we run the script). This means that to support localhost migration a block hotplug script needs to be robust against adding a device twice and should not deactivate the device until it has been removed twice. This should be revisited for 4.3. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com> And in the commit message it says this behavior should be revisited. Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) things look more complicated. One interesting snippet in the commit message is: - libxl should not write the "physical-device" node. This is the responsibility of the block script. Writing the "physical-device" node in libxl basically completely short-cuts the standard block hotplug script which uses "physical-device" to know if it has run already or not. That makes me believe the following fix is the correct thing to do in long term. I have to admit that I cannot fully consume the commit message of 25733 in one day so unless you (Ian) can confirm Roger''s fix will not cause further regression otherwise I would suggest reverting my change at the moment. Wei.> 8<------------------------------------------------------------------- > >From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001 > From: Roger Pau Monne <roger.pau@citrix.com> > Date: Mon, 16 Sep 2013 09:39:05 +0200 > Subject: [PATCH] libxl: don''t write physical-device vbd xenstore node in > libxl > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > libxl used to write the physical-device xenstore node needed by the > phy backend type, because the phy backend type could only be used with > block devices. If libxl allows the backend type phy to be used with > regular files, it can no longer write physical-device because the > hotplug script has to be executed first in order to mount the regular > file into a loop device and then write the physical-device of the loop > device used. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > tools/libxl/libxl.c | 15 --------------- > 1 files changed, 0 insertions(+), 15 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 0879f23..326a378 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, > libxl__xen_script_dir_path()); > flexarray_append_pair(back, "script", script); > > - /* If the user did not supply a block script then we > - * write the physical-device node ourselves. > - * > - * If the user did supply a script then that script is > - * responsible for this since the block device may not > - * exist yet. > - */ > - if (!disk->script && > - disk->backend_domid == LIBXL_TOOLSTACK_DOMID) { > - int major, minor; > - libxl__device_physdisk_major_minor(dev, &major, &minor); > - flexarray_append_pair(back, "physical-device", > - libxl__sprintf(gc, "%x:%x", major, minor)); > - } > - > assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD); > break; > > -- > 1.7.7.5 (Apple Git-26) > >
On 16/09/13 11:55, Wei Liu wrote:> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote: >> On 15/09/13 14:50, Ian Campbell wrote: >>> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: >>>> flight 19308 xen-unstable real [real] >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, >>>> including tests which could not be run: >>>> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>>> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 >>> >>> These are due to /var/run/xen-hotplug/block getting leaked >>> > > The error message in XenStore shows blkback tries to get hold of the > block device 0:0 but there''s no such device entry in system. > >>>> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>>> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>>> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 >>> >>> These are: >>> libxl: error: libxl_device.c:894:device_backend_callback: unable >>> to add device with path /local/domain/0/backend/vbd/9/5632 >>> libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices >>> >>> /var/log/xen/xenhotplug.log contains: >>> xenstore-read: couldn''t read path backend/vbd/9/5632/node >>> >>> For both of these I''m suspicious of: >>> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file >> >> Hello, >> >> I''ve tracked this down to libxl writing a wrong physical-device >> xenstore node when using regular files. When using block devices libxl >> can write the physical-device because it can be fetched without >> requiring the execution of the block script, but with regular files it >> is not true, we must first execute the block script in order to mount >> the regular file into a loop device and then fetch the physical-device >> from the loop device to which the image has been mounted. Following >> patch solves the issue for me. >> > > Yes, that''s the in question I think. That code snippet was introduced in: > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > Author: Ian Campbell <Ian.Campbell@citrix.com> > Date: Tue Aug 7 14:26:29 2012 +0100 > > libxl: write physical-device node if user did not supply a block script > > This reverts one of the intentional changes from 25733:353bc0801b11. > That change exposed an issue with the xl migration protocol, which > although safe triggers the hotplug scripts device sharing logic. > > For 4.2 we disable this logic by writing the physical-device xenstore > node ourselves if a user did not supply a script. If the user did > supply a script then we continue to rely on it to write the > physical-device node (not least because the script may create the > device and therefore it is not available before we run the script). > > This means that to support localhost migration a block hotplug script > needs to be robust against adding a device twice and should not > deactivate the device until it has been removed twice. > > This should be revisited for 4.3. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > And in the commit message it says this behavior should be revisited. > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > things look more complicated. One interesting snippet in the commit > message is: > > - libxl should not write the "physical-device" node. This is the > responsibility of the block script. Writing the "physical-device" > node in libxl basically completely short-cuts the standard block > hotplug script which uses "physical-device" to know if it has run > already or not. > > That makes me believe the following fix is the correct thing to do in > long term. > > I have to admit that I cannot fully consume the commit message of 25733 > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > regression otherwise I would suggest reverting my change at the moment.My fix deals with one part of the problem, but will fail on local migrate (block script will refuse to attach the same device twice). This is indeed a tricky issue, and I cannot see an easy way to deal with it. The proper way to fix this would be to unplug the devices from the suspended domain before creating the new domain, but I''m sure this is not trivial (this would also imply reattaching the devices to the original domain if migration fails).
Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL"):> For both of these I''m suspicious of: > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file > > and to a lesser extent: > a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domidYes.> It doesn''t look like the bisector is looking at this, or else I''m > reading osstest''s resource plan wronglyThe log volume had filled up with git caches. I''m pruning them. Really this should be automatic but really there should be one cache, not one per host. Ian.
On Mon, 2013-09-16 at 10:55 +0100, Wei Liu wrote:> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote: > > On 15/09/13 14:50, Ian Campbell wrote: > > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote: > > >> flight 19308 xen-unstable real [real] > > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/ > > >> > > >> Regressions :-( > > >> > > >> Tests which did not succeed and are blocking, > > >> including tests which could not be run: > > >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208 > > > > > > These are due to /var/run/xen-hotplug/block getting leaked > > > > > The error message in XenStore shows blkback tries to get hold of the > block device 0:0 but there's no such device entry in system. > > > >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208 > > > > > > These are: > > > libxl: error: libxl_device.c:894:device_backend_callback: unable > > > to add device with path /local/domain/0/backend/vbd/9/5632 > > > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices > > > > > > /var/log/xen/xenhotplug.log contains: > > > xenstore-read: couldn't read path backend/vbd/9/5632/node > > > > > > For both of these I'm suspicious of: > > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file > > > > Hello, > > > > I've tracked this down to libxl writing a wrong physical-device > > xenstore node when using regular files. When using block devices libxl > > can write the physical-device because it can be fetched without > > requiring the execution of the block script, but with regular files it > > is not true, we must first execute the block script in order to mount > > the regular file into a loop device and then fetch the physical-device > > from the loop device to which the image has been mounted. Following > > patch solves the issue for me. > > > > Yes, that's the in question I think. That code snippet was introduced in: > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > Author: Ian Campbell <Ian.Campbell@citrix.com> > Date: Tue Aug 7 14:26:29 2012 +0100 > > libxl: write physical-device node if user did not supply a block script > > This reverts one of the intentional changes from 25733:353bc0801b11. > That change exposed an issue with the xl migration protocol, which > although safe triggers the hotplug scripts device sharing logic. > > For 4.2 we disable this logic by writing the physical-device xenstore > node ourselves if a user did not supply a script. If the user did > supply a script then we continue to rely on it to write the > physical-device node (not least because the script may create the > device and therefore it is not available before we run the script). > > This means that to support localhost migration a block hotplug script > needs to be robust against adding a device twice and should not > deactivate the device until it has been removed twice. > > This should be revisited for 4.3. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > And in the commit message it says this behavior should be revisited.Which never happened :-( I don't remember exactly but I think the real fix is a reworking of the sequencing of block device attach/detach vs the migration stop and copy phase, not a simple tweak IIRC.> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > things look more complicated. One interesting snippet in the commit > message is: > > - libxl should not write the "physical-device" node. This is the > responsibility of the block script. Writing the "physical-device" > node in libxl basically completely short-cuts the standard block > hotplug script which uses "physical-device" to know if it has run > already or not. > > That makes me believe the following fix is the correct thing to do in > long term. > > I have to admit that I cannot fully consume the commit message of 25733 > in one day so unless you (Ian) can confirm Roger's fix will not cause further > regression otherwise I would suggest reverting my change at the moment.Can you test some lifecycle operations, in particular localhost migrations with both phy:// and file:// devices to see if it fixes it? If not then we can revert. Perhaps rather than removing that block entirely it should be conditional on S_ISBLK? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote: [...]> > > I''ve tracked this down to libxl writing a wrong physical-device > > > xenstore node when using regular files. When using block devices libxl > > > can write the physical-device because it can be fetched without > > > requiring the execution of the block script, but with regular files it > > > is not true, we must first execute the block script in order to mount > > > the regular file into a loop device and then fetch the physical-device > > > from the loop device to which the image has been mounted. Following > > > patch solves the issue for me. > > > > > > > Yes, that''s the in question I think. That code snippet was introduced in: > > > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > > Author: Ian Campbell <Ian.Campbell@citrix.com> > > Date: Tue Aug 7 14:26:29 2012 +0100 > > > > libxl: write physical-device node if user did not supply a block script > > > > This reverts one of the intentional changes from 25733:353bc0801b11. > > That change exposed an issue with the xl migration protocol, which > > although safe triggers the hotplug scripts device sharing logic. > > > > For 4.2 we disable this logic by writing the physical-device xenstore > > node ourselves if a user did not supply a script. If the user did > > supply a script then we continue to rely on it to write the > > physical-device node (not least because the script may create the > > device and therefore it is not available before we run the script). > > > > This means that to support localhost migration a block hotplug script > > needs to be robust against adding a device twice and should not > > deactivate the device until it has been removed twice. > > > > This should be revisited for 4.3. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > > > And in the commit message it says this behavior should be revisited. > > Which never happened :-( > > I don''t remember exactly but I think the real fix is a reworking of the > sequencing of block device attach/detach vs the migration stop and copy > phase, not a simple tweak IIRC. > > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > > things look more complicated. One interesting snippet in the commit > > message is: > > > > - libxl should not write the "physical-device" node. This is the > > responsibility of the block script. Writing the "physical-device" > > node in libxl basically completely short-cuts the standard block > > hotplug script which uses "physical-device" to know if it has run > > already or not. > > > > That makes me believe the following fix is the correct thing to do in > > long term. > > > > I have to admit that I cannot fully consume the commit message of 25733 > > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > > regression otherwise I would suggest reverting my change at the moment. > > Can you test some lifecycle operations, in particular localhost > migrations with both phy:// and file:// devices to see if it fixes it? > If not then we can revert. >Unfortunately with Roger''s patch applied local migration for raw format file disk doesn''t work. xc: detail: Save exit of domid 69 with rc=0 libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1 libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0, which is mounted in a guest domain, and so cannot be mounted now. libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1 libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. migration target: Domain creation failed (code -3). libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3 Migration failed, resuming at sender.> Perhaps rather than removing that block entirely it should be > conditional on S_ISBLK? >With the conditional on S_ISBLK, raw format file mounted to loopdev, local migration still breaks with above error. So for now please revert that change. Wei.> Ian.
On Mon, 2013-09-16 at 14:25 +0100, Wei Liu wrote:> On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote: > [...] > > > > I''ve tracked this down to libxl writing a wrong physical-device > > > > xenstore node when using regular files. When using block devices libxl > > > > can write the physical-device because it can be fetched without > > > > requiring the execution of the block script, but with regular files it > > > > is not true, we must first execute the block script in order to mount > > > > the regular file into a loop device and then fetch the physical-device > > > > from the loop device to which the image has been mounted. Following > > > > patch solves the issue for me. > > > > > > > > > > Yes, that''s the in question I think. That code snippet was introduced in: > > > > > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867 > > > Author: Ian Campbell <Ian.Campbell@citrix.com> > > > Date: Tue Aug 7 14:26:29 2012 +0100 > > > > > > libxl: write physical-device node if user did not supply a block script > > > > > > This reverts one of the intentional changes from 25733:353bc0801b11. > > > That change exposed an issue with the xl migration protocol, which > > > although safe triggers the hotplug scripts device sharing logic. > > > > > > For 4.2 we disable this logic by writing the physical-device xenstore > > > node ourselves if a user did not supply a script. If the user did > > > supply a script then we continue to rely on it to write the > > > physical-device node (not least because the script may create the > > > device and therefore it is not available before we run the script). > > > > > > This means that to support localhost migration a block hotplug script > > > needs to be robust against adding a device twice and should not > > > deactivate the device until it has been removed twice. > > > > > > This should be revisited for 4.3. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > > Committed-by: Ian Campbell <ian.campbell@citrix.com> > > > > > > And in the commit message it says this behavior should be revisited. > > > > Which never happened :-( > > > > I don''t remember exactly but I think the real fix is a reworking of the > > sequencing of block device attach/detach vs the migration stop and copy > > phase, not a simple tweak IIRC. > > > > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11) > > > things look more complicated. One interesting snippet in the commit > > > message is: > > > > > > - libxl should not write the "physical-device" node. This is the > > > responsibility of the block script. Writing the "physical-device" > > > node in libxl basically completely short-cuts the standard block > > > hotplug script which uses "physical-device" to know if it has run > > > already or not. > > > > > > That makes me believe the following fix is the correct thing to do in > > > long term. > > > > > > I have to admit that I cannot fully consume the commit message of 25733 > > > in one day so unless you (Ian) can confirm Roger''s fix will not cause further > > > regression otherwise I would suggest reverting my change at the moment. > > > > Can you test some lifecycle operations, in particular localhost > > migrations with both phy:// and file:// devices to see if it fixes it? > > If not then we can revert. > > > > Unfortunately with Roger''s patch applied local migration for raw format > file disk doesn''t work. > > xc: detail: Save exit of domid 69 with rc=0 > libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1 > libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0, > which is mounted in a guest domain, > and so cannot be mounted now. > libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices > libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1 > libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected. > migration target: Domain creation failed (code -3). > libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream > libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3 > Migration failed, resuming at sender. > > > Perhaps rather than removing that block entirely it should be > > conditional on S_ISBLK? > > > > With the conditional on S_ISBLK, raw format file mounted to loopdev, > local migration still breaks with above error. > > So for now please revert that change.Done. Ian.