thr3ads.net - Xen devel - [xen-unstable test] 13461: regressions

If this information is useful, please help other people find it:
Share via:

xen.org

2012-Jul-05 04:23 UTC

[xen-unstable test] 13461: regressions - FAIL

flight 13461 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-pv           18 leak-check/check          fail REGR. vs. 13459
 test-amd64-amd64-pv          18 leak-check/check          fail REGR. vs. 13459
 test-amd64-i386-xl           18 leak-check/check          fail REGR. vs. 13459
 test-amd64-i386-xl-multivcpu 18 leak-check/check          fail REGR. vs. 13459
 test-amd64-i386-xl-credit2   18 leak-check/check          fail REGR. vs. 13459
 test-amd64-amd64-pair        21 leak-check/check/dst_host fail REGR. vs. 13459
 test-amd64-amd64-pair        20 leak-check/check/src_host fail REGR. vs. 13459
 test-i386-i386-pv            18 leak-check/check          fail REGR. vs. 13459
 test-i386-i386-xl            18 leak-check/check          fail REGR. vs. 13459
 test-amd64-amd64-xl          18 leak-check/check          fail REGR. vs. 13459
 test-i386-i386-pair          20 leak-check/check/src_host fail REGR. vs. 13459
 test-i386-i386-pair          21 leak-check/check/dst_host fail REGR. vs. 13459
 test-amd64-i386-pair         20 leak-check/check/src_host fail REGR. vs. 13459
 test-amd64-i386-pair         21 leak-check/check/dst_host fail REGR. vs. 13459

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-sedf     18 leak-check/check          fail REGR. vs. 13458
 test-amd64-amd64-xl-sedf-pin 18 leak-check/check          fail REGR. vs. 13459
 test-i386-i386-xl-qemuu-winxpsp3  9 guest-localmigrate         fail like 13459
 test-amd64-amd64-xl-qemuu-winxpsp3  9 guest-localmigrate       fail like 13459
 test-amd64-amd64-xl-qemuu-win7-amd64  9 guest-localmigrate     fail like 13459

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pcipt-intel  9 guest-start                 fail never pass
 test-amd64-i386-rhel6hvm-amd 11 leak-check/check             fail   never pass
 test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check         fail never pass
 test-amd64-i386-rhel6hvm-intel 11 leak-check/check             fail never pass
 test-amd64-i386-xend-winxpsp3 16 leak-check/check             fail  never pass
 test-amd64-amd64-win         16 leak-check/check             fail   never pass
 test-amd64-i386-win-vcpus1   16 leak-check/check             fail   never pass
 test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check       fail never pass
 test-amd64-i386-win          16 leak-check/check             fail   never pass
 test-i386-i386-win           16 leak-check/check             fail   never pass
 test-i386-i386-xl-win        13 guest-stop                   fail   never pass
 test-amd64-amd64-xl-win      13 guest-stop                   fail   never pass
 test-amd64-i386-xl-win7-amd64 13 guest-stop                   fail  never pass
 test-amd64-amd64-xl-winxpsp3 13 guest-stop                   fail   never pass
 test-i386-i386-xl-winxpsp3   13 guest-stop                   fail   never pass
 test-amd64-i386-xl-win-vcpus1 13 guest-stop                   fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop               fail never pass
 test-amd64-amd64-xl-win7-amd64 13 guest-stop                   fail never pass

version targeted for testing:
 xen                  7c0351dc41a5
baseline version:
 xen                  42f76d536b11

------------------------------------------------------------
People who touched revisions under test:
  Daniel P. Berrange <berrange@redhat.com>
  Deep Debroy <ddebroy@gmail.com>
  Ian Campbell <ian.campbell@citrix.com>
  Ian Jackson <ian.jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Keir Fraser <keir@xen.org>
  Matt Wilson <msw@amazon.com>
  Michael Young <m.a.young@durham.ac.uk>
  Olaf Hering <olaf@aepfle.de>
  Roger Pau Monn? <roger.pau@citrix.com>
  Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  W. Michael Petullo <mike@flyn.org>
  Zhigang Wang <zhigang.x.wang@oracle.com>
------------------------------------------------------------

jobs:
 build-amd64                                                  pass    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-amd64-xl                                          fail    
 test-amd64-i386-xl                                           fail    
 test-i386-i386-xl                                            fail    
 test-amd64-i386-rhel6hvm-amd                                 fail    
 test-amd64-i386-qemuu-rhel6hvm-amd                           fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-amd64-xl-win7-amd64                               fail    
 test-amd64-i386-xl-win7-amd64                                fail    
 test-amd64-i386-xl-credit2                                   fail    
 test-amd64-amd64-xl-pcipt-intel                              fail    
 test-amd64-i386-rhel6hvm-intel                               fail    
 test-amd64-i386-qemuu-rhel6hvm-intel                         fail    
 test-amd64-i386-xl-multivcpu                                 fail    
 test-amd64-amd64-pair                                        fail    
 test-amd64-i386-pair                                         fail    
 test-i386-i386-pair                                          fail    
 test-amd64-amd64-xl-sedf-pin                                 fail    
 test-amd64-amd64-pv                                          fail    
 test-amd64-i386-pv                                           fail    
 test-i386-i386-pv                                            fail    
 test-amd64-amd64-xl-sedf                                     fail    
 test-amd64-i386-win-vcpus1                                   fail    
 test-amd64-i386-xl-win-vcpus1                                fail    
 test-amd64-i386-xl-winxpsp3-vcpus1                           fail    
 test-amd64-amd64-win                                         fail    
 test-amd64-i386-win                                          fail    
 test-i386-i386-win                                           fail    
 test-amd64-amd64-xl-win                                      fail    
 test-i386-i386-xl-win                                        fail    
 test-amd64-amd64-xl-qemuu-winxpsp3                           fail    
 test-i386-i386-xl-qemuu-winxpsp3                             fail    
 test-amd64-i386-xend-winxpsp3                                fail    
 test-amd64-amd64-xl-winxpsp3                                 fail    
 test-i386-i386-xl-winxpsp3                                   fail    


------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
changeset:   25593:7c0351dc41a5
tag:         tip
user:        Olaf Hering <olaf@aepfle.de>
date:        Wed Jul 04 15:46:17 2012 +0100
    
    tools/m4: add AC_LANG_SOURCE to fix autoconf warnings
    
    I see these warnings with autoconf 2.68, add AC_LANG_SOURCE as suggested
    by upstream documentation.
    
    ...
     # bash autogen.sh
    configure.ac:141: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected
in body
    ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
    ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
    ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
    ../../lib/m4sugar/m4sh.m4:606: AS_IF is expanded from...
    ../../lib/autoconf/general.m4:2032: AC_CACHE_VAL is expanded from...
    ../../lib/autoconf/general.m4:2053: AC_CACHE_CHECK is expanded from...
    m4/pthread.m4:21: AX_CHECK_PTHREAD is expanded from...
    configure.ac:141: the top level
    configure.ac:142: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected
in body
    ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
    ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
    ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
    ../../lib/m4sugar/m4sh.m4:606: AS_IF is expanded from...
    ../../lib/autoconf/general.m4:2032: AC_CACHE_VAL is expanded from...
    ../../lib/autoconf/general.m4:2053: AC_CACHE_CHECK is expanded from...
    m4/ptyfuncs.m4:1: AX_CHECK_PTYFUNCS is expanded from...
    configure.ac:142: the top level
    configure.ac:141: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected
in body
    ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
    ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
    ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
    ../../lib/m4sugar/m4sh.m4:606: AS_IF is expanded from...
    ../../lib/autoconf/general.m4:2032: AC_CACHE_VAL is expanded from...
    ../../lib/autoconf/general.m4:2053: AC_CACHE_CHECK is expanded from...
    m4/pthread.m4:21: AX_CHECK_PTHREAD is expanded from...
    configure.ac:141: the top level
    configure.ac:142: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected
in body
    ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
    ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
    ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
    ../../lib/m4sugar/m4sh.m4:606: AS_IF is expanded from...
    ../../lib/autoconf/general.m4:2032: AC_CACHE_VAL is expanded from...
    ../../lib/autoconf/general.m4:2053: AC_CACHE_CHECK is expanded from...
    m4/ptyfuncs.m4:1: AX_CHECK_PTYFUNCS is expanded from...
    configure.ac:142: the top level
    
    Please rerun autoconf after applying this.
    
    Signed-off-by: Olaf Hering <olaf@aepfle.de>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Committed-by: Ian Campbell <ian.campbell@citrix.com>
    
    
changeset:   25592:197138cc9436
user:        Olaf Hering <olaf@aepfle.de>
date:        Wed Jul 04 15:46:16 2012 +0100
    
    tools/configure.ac: add version check for glib2
    
    xen-unstable fails to build in a SLES10SP4 environment since a long time
    because the included version of glib is slightly older than the required
    glib version. According to the glib docs version 2.12 includes base64
    support, but SLES10 is shipped with glib 2.8.6:
    
    qemu-timer-common.o: In function `init_get_clock'':
   
/usr/src/packages/BUILD/xen-4.2.25432/non-dbg/tools/qemu-xen-dir/qemu-timer-common.c:57:
    undefined reference to `clock_gettime''
    qga/guest-agent-commands.o: In function `qmp_guest_file_write'':
    qga/guest-agent-commands.c:249: undefined reference to
`g_base64_decode''
    qga/guest-agent-commands.o: In function `qmp_guest_file_read'':
    qga/guest-agent-commands.c:224: undefined reference to
`g_base64_encode''
    collect2: ld returned 1 exit status
    make[3]: *** [qemu-ga] Error 1
    
    Add a version check to toplevel configure to require at least glib 2.12.
    This makes sure configure can detect the condition early instead of
    failing later in the middle of tools build when qemu-upstream errors
    out.
    
    Please rerun autoconf after applying this.
    
    Signed-off-by: Olaf Hering <olaf@aepfle.de>
    Acked-by: Roger Pau Monn? <roger.pau@citrix.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Committed-by: Ian Campbell <ian.campbell@citrix.com>
    
    
changeset:   25591:328f466f9ee6
user:        W. Michael Petullo <mike@flyn.org>
date:        Wed Jul 04 15:46:15 2012 +0100
    
    xl: Allow use of /dev/null with xl create to enable command-line definition
    
    xm allows specifying /dev/null as the domain configuration argument to its
    create option; add same functionality to xl. xl treats the configuration
    argument /dev/null as a special case.  This allows specifying an entire
    domain configuration on the command line.
    
    Signed-off-by: W. Michael Petullo <mike@flyn.org>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    [ ijc -- ported to xen-unstable from 4.1 ]
    Committed-by: Ian Campbell <ian.campbell@citrix.com>
    
    
changeset:   25590:bb250383a4f5
user:        Zhigang Wang <zhigang.x.wang@oracle.com>
date:        Wed Jul 04 15:46:14 2012 +0100
    
    hotplug/Linux: use flock based locking
    
    In the normal case of a single domain booting with one disk, the disk
hotplug
    script will fire once. In this case taking out the lock will never cause a
    sleep because there''s no other concurrent invocations. If a domain
has 4 disks
    configured, then the hotplug script will fire 4 times, all 4 invocations at
    pretty much the same time. If there is even a little load on the system, the
    locking function in the shell script will sleep for a few seconds - as many
as
    5 seconds, or potentially even time out & fail completely.
    
    If say 4 or even more domains each with 4 disks start up at once,
that''s 16
    invocations of the hotplug script running at once. There will be a lot of
    sleep''s done & because of the very coarse 1 second granularity
the delay can
    add up significantly.
    
    The change to using flock() removes the arbitrary 1 second sleep, so the
very
    instant once hotplug script finishes, another can start & there is no
repeated
    attempts & failures to lock which would add more delay.
    
    In addition the current locking implementation would allow two processes get
    the lock simultaneously if one decided the other had timed out.
    
    Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
    Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Committed-by: Ian Campbell <ian.campbell@citrix.com>
    
    
changeset:   25589:60f09d1ab1fe
user:        M A Young <m.a.young@durham.ac.uk>
date:        Wed Jul 04 15:46:14 2012 +0100
    
    pygrub: cope better with big files in the guest.
    
    Only read the first megabyte of a configuration file (grub etc.) and read
the
    kernel and ramdisk files from the guest in one megabyte pieces so pygrub
    doesn''t use a lot of memory if the files are large.  With
--not-really option
    check that the chosen kernel and ramdisk files exist.  If there are problems
    writing the copy of the kernel or ramdisk, delete the copied files and exit
in
    case they have filled the filesystem.
    
    Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
    Acked-by: Matt Wilson <msw@amazon.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    Committed-by: Ian Campbell <ian.campbell@citrix.com>
    
    
changeset:   25588:42f76d536b11
user:        Stefano Stabellini <stefano.stabellini@eu.citrix.com>
date:        Tue Jul 03 13:39:01 2012 +0100
    
    xen: event channel remapping for emulated MSIs
    
    Linux PV on HVM guests remap all the MSIs onto event channels,
    including MSIs corresponding to QEMU''s emulated devices.  This
patch
    makes sure that we handle correctly the case of emulated MSI that have
    been remapped, sending a pirq to the guest instead.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Tested-by: Deep Debroy <ddebroy@gmail.com>
    Committed-by: Keir Fraser <keir@xen.org>
    
    
=======================================commit
fa2e8e3bc869c04d4a4d9b8f70c1cab7e53778d6
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Jun 29 17:06:25 2012 +0100

    passthrough: fix off-by-one in PCI config space register index check
    
    Register 255 (0xff) is still valid to be accessed.
    
    Reported-by: Rolu <rolu@roce.org>
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Campbell

2012-Jul-05 05:49 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 05:23 +0100, xen.org wrote:> flight 13461 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-pv           18 leak-check/check          fail REGR. vs.
13459
>  test-amd64-amd64-pv          18 leak-check/check          fail REGR. vs.
13459
http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/test-amd64-amd64-xl/18.ts-leak-check.log:
2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/iptables] file: 465271  
0 -rw-r--r--   1 root     root            0 Jul  5 02:37
/var/run/xen-hotplug/iptables
2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/block] file: 465252    0
-rw-r--r--   1 root     root            0 Jul  5 02:38
/var/run/xen-hotplug/block

This is obviously as a result of the "hotplug/Linux: use flock based
locking" changes.

However it isn''t clear to me that this isn''t a feature of the
way flock
based locking works, since it locks on (I guess) the underlying inode
not the path, such that cleaning up these files in release_lock would be
incorrect because two invocations would end up effectively taking
different locks despite using the same path:

	Thread A	Thread B	Thread C
	claim /foo
			claim /foo
			blocks
	release /foo
	rm /foo
			unblock
					claim /foo
					new /foo -- doesn''t block!

Both B and C have "the" lock.

Is that correct?

Ian.

Daniel P. Berrange

2012-Jul-05 09:28 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, Jul 05, 2012 at 06:49:33AM +0100, Ian Campbell
wrote:> On Thu, 2012-07-05 at 05:23 +0100, xen.org wrote:
> > flight 13461 xen-unstable real [real]
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-i386-pv           18 leak-check/check          fail REGR.
vs. 13459
> >  test-amd64-amd64-pv          18 leak-check/check          fail REGR.
vs. 13459
> 
>
http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/test-amd64-amd64-xl/18.ts-leak-check.log:
> 2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/iptables] file:
465271    0 -rw-r--r--   1 root     root            0 Jul  5 02:37
/var/run/xen-hotplug/iptables
> 2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/block] file: 465252
0 -rw-r--r--   1 root     root            0 Jul  5 02:38
/var/run/xen-hotplug/block
> 
> This is obviously as a result of the "hotplug/Linux: use flock based
> locking" changes.
> 
> However it isn''t clear to me that this isn''t a feature of
the way flock
> based locking works, since it locks on (I guess) the underlying inode
> not the path, such that cleaning up these files in release_lock would be
> incorrect because two invocations would end up effectively taking
> different locks despite using the same path:
> 
> 	Thread A	Thread B	Thread C
> 	claim /foo
> 			claim /foo
> 			blocks
> 	release /foo
> 	rm /foo
> 			unblock
> 					claim /foo
> 					new /foo -- doesn''t block!
> 
> Both B and C have "the" lock.
> 
> Is that correct?
Yes, as you say flock() operates on the inode, so if something deletes
and recreates the file, future flocks will operate differently. Ideally
you should just never rm the files at all.

If you need to ''rm'' them, then to avoid this, you must do two
things

 - Only ''rm /foo'' while holding the lock on /foo
 - Record the inode before acquiring the lock. After acquiring the
   lock check whether the inode on disk is the same. If not,
   release the lock & repeat.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Ian Campbell

2012-Jul-05 09:31 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 10:28 +0100, Daniel P. Berrange
wrote:> On Thu, Jul 05, 2012 at 06:49:33AM +0100, Ian Campbell wrote:
> > On Thu, 2012-07-05 at 05:23 +0100, xen.org wrote:
> > > flight 13461 xen-unstable real [real]
> > > http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-amd64-i386-pv           18 leak-check/check          fail
REGR. vs. 13459
> > >  test-amd64-amd64-pv          18 leak-check/check          fail
REGR. vs. 13459
> > 
> >
http://www.chiark.greenend.org.uk/~xensrcts/logs/13461/test-amd64-amd64-xl/18.ts-leak-check.log:
> > 2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/iptables]
file: 465271    0 -rw-r--r--   1 root     root            0 Jul  5 02:37
/var/run/xen-hotplug/iptables
> > 2012-07-05 01:45:13 Z LEAKED [file /var/run/xen-hotplug/block] file:
465252    0 -rw-r--r--   1 root     root            0 Jul  5 02:38
/var/run/xen-hotplug/block
> > 
> > This is obviously as a result of the "hotplug/Linux: use flock
based
> > locking" changes.
> > 
> > However it isn''t clear to me that this isn''t a
feature of the way flock
> > based locking works, since it locks on (I guess) the underlying inode
> > not the path, such that cleaning up these files in release_lock would
be
> > incorrect because two invocations would end up effectively taking
> > different locks despite using the same path:
> > 
> > 	Thread A	Thread B	Thread C
> > 	claim /foo
> > 			claim /foo
> > 			blocks
> > 	release /foo
> > 	rm /foo
> > 			unblock
> > 					claim /foo
> > 					new /foo -- doesn''t block!
> > 
> > Both B and C have "the" lock.
> > 
> > Is that correct?
> 
> Yes, as you say flock() operates on the inode, so if something deletes
> and recreates the file, future flocks will operate differently. Ideally
> you should just never rm the files at all.
> 
> If you need to ''rm'' them, then to avoid this, you must do
two things
> 
>  - Only ''rm /foo'' while holding the lock on /foo
>  - Record the inode before acquiring the lock. After acquiring the
>    lock check whether the inode on disk is the same. If not,
>    release the lock & repeat.
Thanks. I''m inclined towards not deleting the lock file.

Ian.

Ian Jackson

2012-Jul-05 10:40 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):> Yes, as you say flock() operates on the inode, so if something deletes
> and recreates the file, future flocks will operate differently. Ideally
> you should just never rm the files at all.
> 
> If you need to ''rm'' them, then to avoid this, you must do
two things
> 
>  - Only ''rm /foo'' while holding the lock on /foo
>  - Record the inode before acquiring the lock. After acquiring the
>    lock check whether the inode on disk is the same. If not,
>    release the lock & repeat.
It seems more logical to me to check the inum of the open fd against
the file.  Something like this perhaps (untested):

diff -r ad08cd8e7097 tools/hotplug/Linux/locking.sh
--- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012 +0100
+++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:39:59 2012 +0100
@@ -30,6 +30,7 @@ _setlockfd()
     done
     _lockdict[$i]="$1"
     let _lockfd=200+i
+    let _lockfile="$LOCK_BASEDIR/$1"
 }
 
 
@@ -37,13 +38,32 @@ claim_lock()
 {
     mkdir -p "$LOCK_BASEDIR"
     _setlockfd $1
-    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
-    flock -x $_lockfd
+    # The locking strategy is identical to that from with-lock-ex(1)
+    # from chiark-utils, except using flock.  It has the benefit of
+    # it being possible to safely remove the lockfile when done.
+    local rightfile
+    while true; do
+        eval "exec $_lockfd>>$lockfile"
+        flock -x $_lockfd
+        # We can''t just stat /dev/stdin or /proc/self/fd/$_lockfd or
+        # use bash''s test -ef because those all go through what is
+        # actually a synthetic symlink in /proc and we aren''t
+        # guaranteed that our stat(2) won''t lose the race with an
+        # rm(1) between reading the synthetic link and traversing the
+        # file system to find the inum.  Perl is very fast so use that.
+        rightfile=$( perl -e ''
+            open STDIN, "<&''$_lockfd''" or
die $!;
+            my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
+            my $file_inum = (stat $ARGV[0])[1];
+            print "y\n" if $fd_inum eq $file_inum;
+                             '' "$_lockfile" )
+        if [ x$rightfile = xy ]; then break; fi
+    done
 }
 
 
 release_lock()
 {
     _setlockfd $1
-    flock -u $_lockfd
+    rm "$_lockfile"
 }

Ian.

Daniel P. Berrange

2012-Jul-05 10:46 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, Jul 05, 2012 at 11:40:04AM +0100, Ian Jackson
wrote:> Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> > Yes, as you say flock() operates on the inode, so if something deletes
> > and recreates the file, future flocks will operate differently.
Ideally
> > you should just never rm the files at all.
> > 
> > If you need to ''rm'' them, then to avoid this, you
must do two things
> > 
> >  - Only ''rm /foo'' while holding the lock on /foo
> >  - Record the inode before acquiring the lock. After acquiring the
> >    lock check whether the inode on disk is the same. If not,
> >    release the lock & repeat.
> 
> It seems more logical to me to check the inum of the open fd against
> the file.  Something like this perhaps (untested):
Yes, using the open fd is actually what I should have said.
> 
> diff -r ad08cd8e7097 tools/hotplug/Linux/locking.sh
> --- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012 +0100
> +++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:39:59 2012 +0100
> @@ -30,6 +30,7 @@ _setlockfd()
>      done
>      _lockdict[$i]="$1"
>      let _lockfd=200+i
> +    let _lockfile="$LOCK_BASEDIR/$1"
>  }
>  
>  
> @@ -37,13 +38,32 @@ claim_lock()
>  {
>      mkdir -p "$LOCK_BASEDIR"
>      _setlockfd $1
> -    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
> -    flock -x $_lockfd
> +    # The locking strategy is identical to that from with-lock-ex(1)
> +    # from chiark-utils, except using flock.  It has the benefit of
> +    # it being possible to safely remove the lockfile when done.
> +    local rightfile
> +    while true; do
> +        eval "exec $_lockfd>>$lockfile"
> +        flock -x $_lockfd
> +        # We can''t just stat /dev/stdin or /proc/self/fd/$_lockfd
or
> +        # use bash''s test -ef because those all go through what
is
> +        # actually a synthetic symlink in /proc and we aren''t
> +        # guaranteed that our stat(2) won''t lose the race with an
> +        # rm(1) between reading the synthetic link and traversing the
> +        # file system to find the inum.  Perl is very fast so use that.
> +        rightfile=$( perl -e ''
> +            open STDIN, "<&''$_lockfd''"
or die $!;
> +            my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
> +            my $file_inum = (stat $ARGV[0])[1];
> +            print "y\n" if $fd_inum eq $file_inum;
> +                             '' "$_lockfile" )
> +        if [ x$rightfile = xy ]; then break; fi
> +    done
>  }
>  
>  
>  release_lock()
>  {
>      _setlockfd $1
> -    flock -u $_lockfd
> +    rm "$_lockfile"
I think you still want the ''flock'' line here, but have it
after the
''rm'' line. Otherwise you leave the $_lockfd filehandle open.
Yes,
I know the calling script will probably just exit, but it doesn''t
hurt to be careful here.

Regards,
Danie.
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Ian Jackson

2012-Jul-05 10:54 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):> On Thu, Jul 05, 2012 at 11:40:04AM +0100, Ian Jackson wrote:
> >  release_lock()
> >  {
> >      _setlockfd $1
> > -    flock -u $_lockfd
> > +    rm "$_lockfile"
> 
> I think you still want the ''flock'' line here, but have it
after the
> ''rm'' line. Otherwise you leave the $_lockfd filehandle
open. Yes,
> I know the calling script will probably just exit, but it doesn''t
> hurt to be careful here.
flock -u won''t close the fd.  If that''s what you want then we
would
have to do so explicitly.

But, in the meantime the old lockfile exists and is pointlessly
flocked but is unlinked, which is harmless apart from being a slight
resource leak.

Note that anyone who inherits a the fd from the hotplug script will
end up keeping the file in existence, so closing it here (with or
without unlocking it) may not do any good.  And if there are no such
children with a copy of the fd then when the hotplug script dies or
exits the stale lockfile will be reaped by the kernel.

So I think it''s fine as it is.

Ian.

Daniel P. Berrange

2012-Jul-05 10:57 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, Jul 05, 2012 at 11:54:14AM +0100, Ian Jackson
wrote:> Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> > On Thu, Jul 05, 2012 at 11:40:04AM +0100, Ian Jackson wrote:
> > >  release_lock()
> > >  {
> > >      _setlockfd $1
> > > -    flock -u $_lockfd
> > > +    rm "$_lockfile"
> > 
> > I think you still want the ''flock'' line here, but
have it after the
> > ''rm'' line. Otherwise you leave the $_lockfd
filehandle open. Yes,
> > I know the calling script will probably just exit, but it
doesn''t
> > hurt to be careful here.
> 
> flock -u won''t close the fd.  If that''s what you want
then we would
> have to do so explicitly.
> 
> But, in the meantime the old lockfile exists and is pointlessly
> flocked but is unlinked, which is harmless apart from being a slight
> resource leak.
> 
> Note that anyone who inherits a the fd from the hotplug script will
> end up keeping the file in existence, so closing it here (with or
> without unlocking it) may not do any good.  And if there are no such
> children with a copy of the fd then when the hotplug script dies or
> exits the stale lockfile will be reaped by the kernel.
> 
> So I think it''s fine as it is.
Ok, agreed.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Ian Campbell

2012-Jul-05 10:57 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 11:40 +0100, Ian Jackson wrote:> Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> > Yes, as you say flock() operates on the inode, so if something deletes
> > and recreates the file, future flocks will operate differently.
Ideally
> > you should just never rm the files at all.
> > 
> > If you need to ''rm'' them, then to avoid this, you
must do two things
> > 
> >  - Only ''rm /foo'' while holding the lock on /foo
> >  - Record the inode before acquiring the lock. After acquiring the
> >    lock check whether the inode on disk is the same. If not,
> >    release the lock & repeat.
> 
> It seems more logical to me to check the inum of the open fd against
> the file.  Something like this perhaps (untested):
> 
> diff -r ad08cd8e7097 tools/hotplug/Linux/locking.sh
> --- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012 +0100
> +++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:39:59 2012 +0100
> @@ -30,6 +30,7 @@ _setlockfd()
>      done
>      _lockdict[$i]="$1"
>      let _lockfd=200+i
> +    let _lockfile="$LOCK_BASEDIR/$1"
>  }
>  
> 
> @@ -37,13 +38,32 @@ claim_lock()
>  {
>      mkdir -p "$LOCK_BASEDIR"
>      _setlockfd $1
> -    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
> -    flock -x $_lockfd
> +    # The locking strategy is identical to that from with-lock-ex(1)
> +    # from chiark-utils, except using flock.  It has the benefit of
> +    # it being possible to safely remove the lockfile when done.
> +    local rightfile
> +    while true; do
> +        eval "exec $_lockfd>>$lockfile"
you mean $_lockfile here I think.
> +        flock -x $_lockfd
> +        # We can''t just stat /dev/stdin or /proc/self/fd/$_lockfd
or
> +        # use bash''s test -ef because those all go through what
is
> +        # actually a synthetic symlink in /proc and we aren''t
> +        # guaranteed that our stat(2) won''t lose the race with an
> +        # rm(1) between reading the synthetic link and traversing the
> +        # file system to find the inum.  Perl is very fast so use that.
> +        rightfile=$( perl -e ''
Won''t this need to become $(PERL) (or @PERL@ and some seddery at
install
time) for the benefit of BSD?

Does stating $_lockfile before and after the flock, in addition to
comparing both with $_lockfd, close the race you are worried about?
> +            open STDIN, "<&''$_lockfd''"
or die $!;
> +            my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
> +            my $file_inum = (stat $ARGV[0])[1];
> +            print "y\n" if $fd_inum eq $file_inum;
> +                             '' "$_lockfile" )
> +        if [ x$rightfile = xy ]; then break; fi
> +    done
>  }
>  
> 
>  release_lock()
>  {
>      _setlockfd $1
> -    flock -u $_lockfd
> +    rm "$_lockfile"
I suppose there is no point in unlocking too?
>  }
> 
> Ian.

Roger Pau Monne

2012-Jul-05 11:00 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

Ian Campbell wrote:> On Thu, 2012-07-05 at 11:40 +0100, Ian Jackson wrote:
>> Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable test]
13461: regressions - FAIL"):
>>> Yes, as you say flock() operates on the inode, so if something
deletes
>>> and recreates the file, future flocks will operate differently.
Ideally
>>> you should just never rm the files at all.
>>>
>>> If you need to ''rm'' them, then to avoid this, you
must do two things
>>>
>>>   - Only ''rm /foo'' while holding the lock on /foo
>>>   - Record the inode before acquiring the lock. After acquiring the
>>>     lock check whether the inode on disk is the same. If not,
>>>     release the lock&  repeat.
>> It seems more logical to me to check the inum of the open fd against
>> the file.  Something like this perhaps (untested):
>>
>> diff -r ad08cd8e7097 tools/hotplug/Linux/locking.sh
>> --- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012 +0100
>> +++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:39:59 2012 +0100
>> @@ -30,6 +30,7 @@ _setlockfd()
>>       done
>>       _lockdict[$i]="$1"
>>       let _lockfd=200+i
>> +    let _lockfile="$LOCK_BASEDIR/$1"
>>   }
>>
>>
>> @@ -37,13 +38,32 @@ claim_lock()
>>   {
>>       mkdir -p "$LOCK_BASEDIR"
>>       _setlockfd $1
>> -    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
>> -    flock -x $_lockfd
>> +    # The locking strategy is identical to that from with-lock-ex(1)
>> +    # from chiark-utils, except using flock.  It has the benefit of
>> +    # it being possible to safely remove the lockfile when done.
>> +    local rightfile
>> +    while true; do
>> +        eval "exec $_lockfd>>$lockfile"
>
> you mean $_lockfile here I think.
>
>> +        flock -x $_lockfd
>> +        # We can''t just stat /dev/stdin or
/proc/self/fd/$_lockfd or
>> +        # use bash''s test -ef because those all go through
what is
>> +        # actually a synthetic symlink in /proc and we aren''t
>> +        # guaranteed that our stat(2) won''t lose the race
with an
>> +        # rm(1) between reading the synthetic link and traversing the
>> +        # file system to find the inum.  Perl is very fast so use
that.
>> +        rightfile=$( perl -e ''
>
> Won''t this need to become $(PERL) (or @PERL@ and some seddery at
install
> time) for the benefit of BSD?
BSD don''t use this scripts, so you don''t have to worry about
this here.

Ian Campbell

2012-Jul-05 11:03 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 12:00 +0100, Roger Pau Monne
wrote:> Ian Campbell wrote:
> > On Thu, 2012-07-05 at 11:40 +0100, Ian Jackson wrote:
> >> Daniel P. Berrange writes ("Re: [Xen-devel] [xen-unstable
test] 13461: regressions - FAIL"):
> >>> Yes, as you say flock() operates on the inode, so if something
deletes
> >>> and recreates the file, future flocks will operate
differently. Ideally
> >>> you should just never rm the files at all.
> >>>
> >>> If you need to ''rm'' them, then to avoid
this, you must do two things
> >>>
> >>>   - Only ''rm /foo'' while holding the lock on
/foo
> >>>   - Record the inode before acquiring the lock. After
acquiring the
> >>>     lock check whether the inode on disk is the same. If not,
> >>>     release the lock&  repeat.
> >> It seems more logical to me to check the inum of the open fd
against
> >> the file.  Something like this perhaps (untested):
> >>
> >> diff -r ad08cd8e7097 tools/hotplug/Linux/locking.sh
> >> --- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012
+0100
> >> +++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:39:59 2012
+0100
> >> @@ -30,6 +30,7 @@ _setlockfd()
> >>       done
> >>       _lockdict[$i]="$1"
> >>       let _lockfd=200+i
> >> +    let _lockfile="$LOCK_BASEDIR/$1"
> >>   }
> >>
> >>
> >> @@ -37,13 +38,32 @@ claim_lock()
> >>   {
> >>       mkdir -p "$LOCK_BASEDIR"
> >>       _setlockfd $1
> >> -    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
> >> -    flock -x $_lockfd
> >> +    # The locking strategy is identical to that from
with-lock-ex(1)
> >> +    # from chiark-utils, except using flock.  It has the benefit
of
> >> +    # it being possible to safely remove the lockfile when done.
> >> +    local rightfile
> >> +    while true; do
> >> +        eval "exec $_lockfd>>$lockfile"
> >
> > you mean $_lockfile here I think.
> >
> >> +        flock -x $_lockfd
> >> +        # We can''t just stat /dev/stdin or
/proc/self/fd/$_lockfd or
> >> +        # use bash''s test -ef because those all go
through what is
> >> +        # actually a synthetic symlink in /proc and we
aren''t
> >> +        # guaranteed that our stat(2) won''t lose the
race with an
> >> +        # rm(1) between reading the synthetic link and traversing
the
> >> +        # file system to find the inum.  Perl is very fast so use
that.
> >> +        rightfile=$( perl -e ''
> >
> > Won''t this need to become $(PERL) (or @PERL@ and some seddery
at install
> > time) for the benefit of BSD?
> 
> BSD don''t use this scripts, so you don''t have to worry
about this here.
Right, obviously!

Ian.

Ian Jackson

2012-Jul-05 11:21 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):> > +    while true; do
> > +        eval "exec $_lockfd>>$lockfile"
> 
> you mean $_lockfile here I think.
Heh.  Yes.
> > +        flock -x $_lockfd
> > +        # We can''t just stat /dev/stdin or
/proc/self/fd/$_lockfd or
> > +        # use bash''s test -ef because those all go through
what is
> > +        # actually a synthetic symlink in /proc and we
aren''t
> > +        # guaranteed that our stat(2) won''t lose the race
with an
> > +        # rm(1) between reading the synthetic link and traversing the
> > +        # file system to find the inum.  Perl is very fast so use
that.
> > +        rightfile=$( perl -e ''
> 
> Won''t this need to become $(PERL) (or @PERL@ and some seddery at
install
> time) for the benefit of BSD?
This is in tools/hotplug/Linux.  AFAICT at least some of the BSDs have
an fstat(1).
> Does stating $_lockfile before and after the flock, in addition to
> comparing both with $_lockfd, close the race you are worried about?
The correct answer to this question is to present a proof that what
I''m doing is correct.  So:

  * The lock is owned not by a process but by an open-file (informally
    an fd).  Any process with an fd onto this open-file is a
    lockholder and may perform the various operations; such a process
    should only do so when its co-lockholder processes expect.  Ie, we
    will treat all processes holding fds onto the open-file as acting
    in concert and not distinguish between them.

  * You are a lockholder if
      - You have an fd onto an open-file which
      - currently holds an exclusive flock lock on its inum
      - and that inum is currently linked at the lockfile path

  * The rules are:
      - No-one but a lockholder may unlink the lockfile path
        (or otherwise cause it to stop referring to a file it
        refers to).
      - Anyone may open the lockfile with O_CREAT

  * The protocol for locking is:
      - Open the file (O_CREAT)
      - flock it
      - fstat the fd you got
      - stat the lockfile
      - if both are equal you have the lock, otherwise try again.

  * Informal proof of exclusivity:
      - No two open-files can hold an fcntl lock onto the same file
        at the same time
      - No two files can have the same name at the same time

  * Informal proof of correctness of locking protocol:
      - After you call flock successfully no-one other than you
        (someone with the same open-file) can stop you having
        that flock lock.
      - Obviously the inum you get from the fstat is fixed
      - At the point where you call stat there are two
        possibilities:
          (i) the lockfile path referred to some other inum
              in which case you have failed
          (ii) the lockfile path referred to the same file
              in which case you had (by definition) at that
              point the file locked.

  * Informal proof that no-one else can steal the lock:
      - After you call flock successfully no-one other than you
        can stop you having that flock lock
      - No-one other than the lockholder is permitted to stop
        the path referring to a particular inum.  So if you
        hold the lock then only you are allowed to stop the
        path referring to the file whose flock you hold
    Thus once you hold the lock at any instant, you will
    continue to do so until you voluntarily stop doing so
    (eg by unlinking the lockfile or closing the fd).

This demonstrates, I think, that the design is correct.  I will do
some testing on the implementation to try to check that it implements
the design.
> I suppose there is no point in unlocking too?
See my reply to Daniel.

Ian.

Ian Campbell

2012-Jul-05 11:32 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 12:21 +0100, Ian Jackson wrote:> Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> > Does stating $_lockfile before and after the flock, in addition to
> > comparing both with $_lockfd, close the race you are worried about?
> 
> The correct answer to this question is to present a proof that what
> I''m doing is correct.
I was actually just speculating on ways to avoid $(PERL) rather than
questioning the algorithm but as Roger and you have pointed out there is
no such concern in this particular file.

The analysis is still useful though.

[...]
> This demonstrates, I think, that the design is correct.  I will do
> some testing on the implementation to try to check that it implements
> the design.
I think so too.

Ian.

Ian Campbell

2012-Jul-05 11:34 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 12:21 +0100, Ian Jackson wrote:> This demonstrates, I think, that the design is correct.  I will do
> some testing on the implementation to try to check that it implements
> the design.
Can we get this checked in today (ideally early this afternoon) do you
think? Otherwise I think we should revert the original locking change,
to unblock testing, and re-do it along with this fix when it is ready.

Ian.

Ian Jackson

2012-Jul-05 11:43 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL [and 1 more messages]

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):> On Thu, 2012-07-05 at 12:21 +0100, Ian Jackson wrote:
> > Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> > > Does stating $_lockfile before and after the flock, in addition
to
> > > comparing both with $_lockfd, close the race you are worried
about?
> > 
> > The correct answer to this question is to present a proof that what
> > I''m doing is correct.
> 
> I was actually just speculating on ways to avoid $(PERL) rather than
> questioning the algorithm but as Roger and you have pointed out there is
> no such concern in this particular file.
Ah, yes.  However, if you look at my proof you''ll see that
it''s really
best to stat the fd.  It might be possible to construct a more
complicated proof that statting the file at some points would be good
enough but I don''t care to do so :-).
> The analysis is still useful though.
I will include it in the file.
> > This demonstrates, I think, that the design is correct.  I will do
> > some testing on the implementation to try to check that it implements
> > the design.
> 
> I think so too.
It works in my tests; I also tested the "lost race" condition and it
did the right thing.

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):> Can we get this checked in today (ideally early this afternoon) do you
> think? Otherwise I think we should revert the original locking change,
> to unblock testing, and re-do it along with this fix when it is ready.
I have pushed the fix.  The code is nearly identical.  I fixed the new
"let" which ought to have been a plain assignment.

I also attach for your interest the test script I used.  For the "lost
race" test I also added a "read" to locking.sh before
"flock".

Ian.

# HG changeset patch
# User Ian Jackson <Ian.Jackson@eu.citrix.com>
# Date 1341488425 -3600
# Node ID 497e2fe4945594cd2846f0302bef5e5dafa1c2c6
# Parent  ad08cd8e7097ec6da6526cf8ac26f0fa72cd1e01
hotplug/Linux: do not leak lockfiles

25590:bb250383a4f5 introduced a new locking scheme.  Unfortunately it
leaves behind files in /var/run/xen-hotplug.  These are spotted as
regressions by the autotester.

Fix this.  This involves changing the locking protocol to allow
lockfiles to be deleted (as removing lockfiles is unsafe with a naive
flock-based algorithm).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

diff -r ad08cd8e7097 -r 497e2fe49455 tools/hotplug/Linux/locking.sh
--- a/tools/hotplug/Linux/locking.sh	Thu Jul 05 11:00:28 2012 +0100
+++ b/tools/hotplug/Linux/locking.sh	Thu Jul 05 12:40:25 2012 +0100
@@ -30,6 +30,7 @@ _setlockfd()
     done
     _lockdict[$i]="$1"
     let _lockfd=200+i
+    _lockfile="$LOCK_BASEDIR/$1"
 }
 
 
@@ -37,13 +38,92 @@ claim_lock()
 {
     mkdir -p "$LOCK_BASEDIR"
     _setlockfd $1
-    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
-    flock -x $_lockfd
+    # The locking strategy is identical to that from with-lock-ex(1)
+    # from chiark-utils, except using flock.  It has the benefit of
+    # it being possible to safely remove the lockfile when done.
+    # See below for a correctness proof.
+    local rightfile
+    while true; do
+        eval "exec $_lockfd>>$_lockfile"
+        flock -x $_lockfd || return $?
+        # We can''t just stat /dev/stdin or /proc/self/fd/$_lockfd or
+        # use bash''s test -ef because those all go through what is
+        # actually a synthetic symlink in /proc and we aren''t
+        # guaranteed that our stat(2) won''t lose the race with an
+        # rm(1) between reading the synthetic link and traversing the
+        # file system to find the inum.  Perl is very fast so use that.
+        rightfile=$( perl -e ''
+            open STDIN, "<&''$_lockfd''" or
die $!;
+            my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
+            my $file_inum = (stat $ARGV[0])[1];
+            print "y\n" if $fd_inum eq $file_inum;
+                             '' "$_lockfile" )
+        if [ x$rightfile = xy ]; then break; fi
+    done
 }
 
 
 release_lock()
 {
     _setlockfd $1
-    flock -u $_lockfd
+    rm "$_lockfile"
 }
+
+# Protocol and correctness proof:
+#
+# * The lock is owned not by a process but by an open-file (informally
+#   an fd).  Any process with an fd onto this open-file is a
+#   lockholder and may perform the various operations; such a process
+#   should only do so when its co-lockholder processes expect.  Ie, we
+#   will treat all processes holding fds onto the open-file as acting
+#   in concert and not distinguish between them.
+#
+# * You are a lockholder if
+#     - You have an fd onto an open-file which
+#       currently holds an exclusive flock lock on its inum
+#     - and that inum is currently linked at the lockfile path
+#
+# * The rules are:
+#     - No-one but a lockholder may unlink the lockfile path
+#       (or otherwise cause it to stop referring to a file it
+#       refers to).
+#     - Anyone may open the lockfile with O_CREAT
+#
+# * The protocol for locking is:
+#     - Open the file (O_CREAT)
+#     - flock it
+#     - fstat the fd you have open
+#     - stat the lockfile path
+#     - if both are equal you have the lock, otherwise try again.
+#
+# * Informal proof of exclusivity:
+#     - No two open-files can hold an fcntl lock onto the same file
+#       at the same time
+#     - No two files can have the same name at the same time
+#
+# * Informal proof of correctness of locking protocol:
+#     - After you call flock successfully no-one other than you
+#       (someone with the same open-file) can stop you having
+#       that flock lock.
+#     - Obviously the inum you get from the fstat is fixed
+#     - At the point where you call stat there are two
+#       possibilities:
+#         (i) the lockfile path referred to some other inum
+#             in which case you have failed
+#         (ii) the lockfile path referred to the same file
+#             in which case at that point you were the
+#             lockholder (by definition).
+#
+# * Informal proof that no-one else can steal the lock:
+#     - After you call flock successfully no-one other than you
+#       can stop you having that flock lock
+#     - No-one other than the lockholder is permitted to stop
+#       the path referring to a particular inum.  So if you
+#       hold the lock then only you are allowed to stop the
+#       path referring to the file whose flock you hold; so
+#       it will continue to refer to that file.
+#   That''s both the conditions for being the lockholder.
+#
+#   Thus once you hold the lock at any instant, you will
+#   continue to do so until you voluntarily stop doing so
+#   (eg by unlinking the lockfile or closing the fd).




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Roger Pau Monne

2012-Jul-05 11:53 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

Ian Jackson wrote:> Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
>>> +        flock -x $_lockfd
>>> +        # We can''t just stat /dev/stdin or
/proc/self/fd/$_lockfd or
>>> +        # use bash''s test -ef because those all go
through what is
>>> +        # actually a synthetic symlink in /proc and we
aren''t
>>> +        # guaranteed that our stat(2) won''t lose the race
with an
>>> +        # rm(1) between reading the synthetic link and traversing
the
>>> +        # file system to find the inum.  Perl is very fast so use
that.
>>> +        rightfile=$( perl -e ''
>> Won''t this need to become $(PERL) (or @PERL@ and some seddery
at install
>> time) for the benefit of BSD?
>
> This is in tools/hotplug/Linux.  AFAICT at least some of the BSDs have
> an fstat(1).
BSD don''t currently have any of this locking mechanism, which makes it 
vulnerable to races regarding hotplug scripts launch. I think libxl 
should be a better place to perform all this kind of locking, instead of 
the hotplug scripts, which are already polluted enough, but as far as I 
know we don''t have any kind of inter-process locking mechanism right
now?

Ian Campbell

2012-Jul-05 11:57 UTC

head link

Re: [xen-unstable test] 13461: regressions - FAIL

On Thu, 2012-07-05 at 12:53 +0100, Roger Pau Monne
wrote:> Ian Jackson wrote:
> > Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 13461:
regressions - FAIL"):
> >>> +        flock -x $_lockfd
> >>> +        # We can''t just stat /dev/stdin or
/proc/self/fd/$_lockfd or
> >>> +        # use bash''s test -ef because those all go
through what is
> >>> +        # actually a synthetic symlink in /proc and we
aren''t
> >>> +        # guaranteed that our stat(2) won''t lose the
race with an
> >>> +        # rm(1) between reading the synthetic link and
traversing the
> >>> +        # file system to find the inum.  Perl is very fast so
use that.
> >>> +        rightfile=$( perl -e ''
> >> Won''t this need to become $(PERL) (or @PERL@ and some
seddery at install
> >> time) for the benefit of BSD?
> >
> > This is in tools/hotplug/Linux.  AFAICT at least some of the BSDs have
> > an fstat(1).
> 
> BSD don''t currently have any of this locking mechanism, which
makes it
> vulnerable to races regarding hotplug scripts launch. I think libxl 
> should be a better place to perform all this kind of locking, instead of 
> the hotplug scripts, which are already polluted enough, but as far as I 
> know we don''t have any kind of inter-process locking mechanism
right now?
The advantage of doing it this way (in the scripts) is that it is much
more fine grained. e.g. the script locks the iptables lock only for as
long as it is manipulating the firewall rules and so on. If libxl were
doing this it would have to lock for the entire duration of the script
which may end up needlessly serialising things.

Ian.

Xen devel - Jul 2012 - [xen-unstable test] 13461: regressions - FAIL

[xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL [and 1 more messages]

Re: [xen-unstable test] 13461: regressions - FAIL

Re: [xen-unstable test] 13461: regressions - FAIL