Hi maintainers, I found there is an issue when ''xm save'' a pvm guest. See below: When I do save then restore once, CPU(%) in xentop showed around 99%. When I do that second time, CPU(%) showed 199% top in dom0 showed: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block I could kill the block process, then all look normal again. xen and xen-tools are both generated with xen-unstable. I tried xl, but it segfault. I also tried ovm3.1.1(xen-4.1.2-18.el5.1 and xen-tools-4.1.2-18.el5.1), can''t reproduce. Did anybody see similar issue before? Any fix already there? thanks zduan
On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote:> Hi maintainers, > > I found there is an issue when ''xm save'' a pvm guest. See below: > > When I do save then restore once, CPU(%) in xentop showed around 99%. > When I do that second time, CPU(%) showed 199% > > top in dom0 showed: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block > 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block > > I could kill the block process, then all look normal again.What is the ''block'' process? If you attach ''perf'' to it do you get an idea of what it is spinning at?> > xen and xen-tools are both generated with xen-unstable. > I tried xl, but it segfault.It segfaulted? When doing ''xl save'' or ''xl resume''? Or just allocating the guest?> I also tried ovm3.1.1(xen-4.1.2-18.el5.1 and xen-tools-4.1.2-18.el5.1), > can''t reproduce.So the issue is only present with Xen-unstable? Did you clear _any_ older Xen libraries/tools when you installed Xen-unstable?> > Did anybody see similar issue before? Any fix already there? > thanks > zduan >
Konrad Rzeszutek Wilk wrote:> On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: > >> Hi maintainers, >> >> I found there is an issue when ''xm save'' a pvm guest. See below: >> >> When I do save then restore once, CPU(%) in xentop showed around 99%. >> When I do that second time, CPU(%) showed 199% >> >> top in dom0 showed: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block >> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block >> >> I could kill the block process, then all look normal again. >> > > What is the ''block'' process? If you attach ''perf'' to it do you get an idea > of what it is spinning at? >It''s /etc/xen/scripts/block I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. When domU was created first time, claim_lock/release_lock finished quickly, when ''xm save'' was called, claim_lock spin in its own while loop. I can ensure no other domU create/save/etc happen when I test.> >> xen and xen-tools are both generated with xen-unstable. >> I tried xl, but it segfault. >> > > It segfaulted? When doing ''xl save'' or ''xl resume''? Or just allocating > the guest? >When xl create vm.cfg> >> I also tried ovm3.1.1(xen-4.1.2-18.el5.1 and xen-tools-4.1.2-18.el5.1), >> can''t reproduce. >> > > So the issue is only present with Xen-unstable? >Yes, I found in /etc/xen/scripts/locking.sh of ovm3.1.1, func claim_lock is quite different to xen-unstable Maybe this is why ovm3.1.1 work with save/restore.> Did you clear _any_ older Xen libraries/tools when you installed Xen-unstable? >No, I built xen and xen-tools on el5, then installed to ovm3.1.1 on other partition. thanks zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote:> Konrad Rzeszutek Wilk wrote: > >On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: > >>Hi maintainers, > >> > >>I found there is an issue when ''xm save'' a pvm guest. See below: > >> > >>When I do save then restore once, CPU(%) in xentop showed around 99%. > >>When I do that second time, CPU(%) showed 199% > >> > >>top in dom0 showed: > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block > >> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block > >> > >>I could kill the block process, then all look normal again. > > > >What is the ''block'' process? If you attach ''perf'' to it do you get an idea > >of what it is spinning at? > It''s /etc/xen/scripts/block > I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. > When domU was created first time, claim_lock/release_lock finished quickly, > when ''xm save'' was called, claim_lock spin in its own while loop. > I can ensure no other domU create/save/etc happen when I test.OK, so how come you have two block processes? Is it b/c you have two disks attached to the guest? The are multiple claim_lock in the shell script - do you know where each of two threads are spinning? Are they spinning on the same function?> >>xen and xen-tools are both generated with xen-unstable. > >>I tried xl, but it segfault. > > > >It segfaulted? When doing ''xl save'' or ''xl resume''? Or just allocating > >the guest? > When xl create vm.cfg > >>I also tried ovm3.1.1(xen-4.1.2-18.el5.1 and xen-tools-4.1.2-18.el5.1), > >>can''t reproduce. > > > >So the issue is only present with Xen-unstable? > Yes, I found in /etc/xen/scripts/locking.sh of ovm3.1.1, func > claim_lock is quite different to xen-unstableCould that be upstreamed? Perhaps that was fix that was added for this exact reason and it just never got upstreamed?> Maybe this is why ovm3.1.1 work with save/restore. > >Did you clear _any_ older Xen libraries/tools when you installed Xen-unstable? > No, I built xen and xen-tools on el5, then installed to ovm3.1.1 on > other partition.Excellent.> thanks > zduan> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote:> On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote: >> Konrad Rzeszutek Wilk wrote: >>> On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: >>>> Hi maintainers, >>>> >>>> I found there is an issue when ''xm save'' a pvm guest. See below: >>>> >>>> When I do save then restore once, CPU(%) in xentop showed around 99%. >>>> When I do that second time, CPU(%) showed 199% >>>> >>>> top in dom0 showed: >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block >>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block >>>> >>>> I could kill the block process, then all look normal again. >>> What is the ''block'' process? If you attach ''perf'' to it do you get an idea >>> of what it is spinning at? >> It''s /etc/xen/scripts/block >> I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. >> When domU was created first time, claim_lock/release_lock finished quickly, >> when ''xm save'' was called, claim_lock spin in its own while loop. >> I can ensure no other domU create/save/etc happen when I test. > OK, so how come you have two block processes? Is it b/c you have two > disks attached to the guest? The are multiple claim_lock in the shell > script - do you know where each of two threads are spinning? Are they > spinning on the same function?In above test, I run save/restore twice, so two block processes. In other test, run save/restore once, there is only one block process. After do ''xm save'', I see block process spin at line 328: 321 remove) 322 case $t in 323 phy) 324 exit 0 325 ;; 326 327 file) 328 claim_lock "block" 329 node=$(xenstore_read "$XENBUS_PATH/node") 330 losetup -d "$node" 331 release_lock "block" 332 exit 0 333 ;; thanks zduan
On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote:> > > On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote: > >On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote: > >>Konrad Rzeszutek Wilk wrote: > >>>On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: > >>>>Hi maintainers, > >>>> > >>>>I found there is an issue when ''xm save'' a pvm guest. See below: > >>>> > >>>>When I do save then restore once, CPU(%) in xentop showed around 99%. > >>>>When I do that second time, CPU(%) showed 199% > >>>> > >>>>top in dom0 showed: > >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block > >>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block > >>>> > >>>>I could kill the block process, then all look normal again. > >>>What is the ''block'' process? If you attach ''perf'' to it do you get an idea > >>>of what it is spinning at? > >>It''s /etc/xen/scripts/block > >>I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. > >>When domU was created first time, claim_lock/release_lock finished quickly, > >>when ''xm save'' was called, claim_lock spin in its own while loop. > >>I can ensure no other domU create/save/etc happen when I test. > >OK, so how come you have two block processes? Is it b/c you have two > >disks attached to the guest? The are multiple claim_lock in the shell > >script - do you know where each of two threads are spinning? Are they > >spinning on the same function? > In above test, I run save/restore twice, so two block processes. > In other test, run save/restore once, there is only one block process. > After do ''xm save'', I see block process spin at line 328: > 321 remove) > 322 case $t in > 323 phy) > 324 exit 0 > 325 ;; > 326 > 327 file) > 328 claim_lock "block" > 329 node=$(xenstore_read "$XENBUS_PATH/node") > 330 losetup -d "$node" > 331 release_lock "block" > 332 exit 0 > 333 ;;So with the patches in OVM - do they have this fixed? Can they be upstreamed or are the dependent on some magic OVM sauce?
On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote: >> >> On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote: >>> On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote: >>>> Konrad Rzeszutek Wilk wrote: >>>>> On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: >>>>>> Hi maintainers, >>>>>> >>>>>> I found there is an issue when ''xm save'' a pvm guest. See below: >>>>>> >>>>>> When I do save then restore once, CPU(%) in xentop showed around 99%. >>>>>> When I do that second time, CPU(%) showed 199% >>>>>> >>>>>> top in dom0 showed: >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block >>>>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block >>>>>> >>>>>> I could kill the block process, then all look normal again. >>>>> What is the ''block'' process? If you attach ''perf'' to it do you get an idea >>>>> of what it is spinning at? >>>> It''s /etc/xen/scripts/block >>>> I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. >>>> When domU was created first time, claim_lock/release_lock finished quickly, >>>> when ''xm save'' was called, claim_lock spin in its own while loop. >>>> I can ensure no other domU create/save/etc happen when I test. >>> OK, so how come you have two block processes? Is it b/c you have two >>> disks attached to the guest? The are multiple claim_lock in the shell >>> script - do you know where each of two threads are spinning? Are they >>> spinning on the same function? >> In above test, I run save/restore twice, so two block processes. >> In other test, run save/restore once, there is only one block process. >> After do ''xm save'', I see block process spin at line 328: >> 321 remove) >> 322 case $t in >> 323 phy) >> 324 exit 0 >> 325 ;; >> 326 >> 327 file) >> 328 claim_lock "block" >> 329 node=$(xenstore_read "$XENBUS_PATH/node") >> 330 losetup -d "$node" >> 331 release_lock "block" >> 332 exit 0 >> 333 ;; > So with the patches in OVM - do they have this fixed? Can they be upstreamed > or are the dependent on some magic OVM sauce?After replace locking.sh with OVM''s, it worked. But xen-tools evolved to use flock based locking currently. We can''t revert back. It seems changeset 25595:497e2fe49455 bring the issue. Finally, I came with a small patch that workaround the issue. diff -r d364becfb083 tools/hotplug/Linux/locking.sh --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200 +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800 @@ -66,6 +66,7 @@ release_lock() { _setlockfd $1 + flock -u $_lockfd rm "$_lockfile" }
Konrad Rzeszutek Wilk
2012-Sep-28 14:01 UTC
Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''
On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote:> On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote: > >On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote: > >> > >>On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote: > >>>On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote: > >>>>Konrad Rzeszutek Wilk wrote: > >>>>>On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: > >>>>>>Hi maintainers, > >>>>>> > >>>>>>I found there is an issue when ''xm save'' a pvm guest. See below: > >>>>>> > >>>>>>When I do save then restore once, CPU(%) in xentop showed around 99%. > >>>>>>When I do that second time, CPU(%) showed 199% > >>>>>> > >>>>>>top in dom0 showed: > >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block > >>>>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block > >>>>>> > >>>>>>I could kill the block process, then all look normal again. > >>>>>What is the ''block'' process? If you attach ''perf'' to it do you get an idea > >>>>>of what it is spinning at? > >>>>It''s /etc/xen/scripts/block > >>>>I add ''set -x'' to /etc/xen/scripts/block, found it blocked at claim_lock. > >>>>When domU was created first time, claim_lock/release_lock finished quickly, > >>>>when ''xm save'' was called, claim_lock spin in its own while loop. > >>>>I can ensure no other domU create/save/etc happen when I test. > >>>OK, so how come you have two block processes? Is it b/c you have two > >>>disks attached to the guest? The are multiple claim_lock in the shell > >>>script - do you know where each of two threads are spinning? Are they > >>>spinning on the same function? > >>In above test, I run save/restore twice, so two block processes. > >>In other test, run save/restore once, there is only one block process. > >>After do ''xm save'', I see block process spin at line 328: > >>321 remove) > >>322 case $t in > >>323 phy) > >>324 exit 0 > >>325 ;; > >>326 > >>327 file) > >>328 claim_lock "block" > >>329 node=$(xenstore_read "$XENBUS_PATH/node") > >>330 losetup -d "$node" > >>331 release_lock "block" > >>332 exit 0 > >>333 ;; > >So with the patches in OVM - do they have this fixed? Can they be upstreamed > >or are the dependent on some magic OVM sauce? > After replace locking.sh with OVM''s, it worked. > But xen-tools evolved to use flock based locking currently. We can''t > revert back. > It seems changeset 25595:497e2fe49455 bring the issue. > Finally, I came with a small patch that workaround the issue.Ian, any thoughts on how this can be fixed properly? The locking is quite heavy and causes two scripts that do deal with blocks to spin for a long time.> > diff -r d364becfb083 tools/hotplug/Linux/locking.sh > --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200 > +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800 > @@ -66,6 +66,7 @@ > release_lock() > { > _setlockfd $1 > + flock -u $_lockfd > rm "$_lockfile" > }
Ian Jackson
2012-Sep-28 14:14 UTC
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''
Konrad Rzeszutek Wilk writes ("Is: Contention in block script when doing guest saving. Was:Re: [Xen-devel] an issue with ''xm save''"):> On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote: > > After replace locking.sh with OVM''s, it worked. > > But xen-tools evolved to use flock based locking currently. We can''t > > revert back. > > It seems changeset 25595:497e2fe49455 bring the issue. > > Finally, I came with a small patch that workaround the issue. > > Ian, any thoughts on how this can be fixed properly? The locking > is quite heavy and causes two scripts that do deal with blocks > to spin for a long time.The algorithm is not supposed to spin. If it spins there is a bug, I think. Can you show me a transcript of the spinning happening with "set -x" in force ?> > diff -r d364becfb083 tools/hotplug/Linux/locking.sh > > --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200 > > +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800 > > @@ -66,6 +66,7 @@ > > release_lock() > > { > > _setlockfd $1 > > + flock -u $_lockfd > > rm "$_lockfile"I''m not sure I understand how this would help. Ian.
Zhenzhong Duan
2012-Sep-28 14:46 UTC
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''
On 2012-09-28 22:14, Ian Jackson wrote:> Konrad Rzeszutek Wilk writes ("Is: Contention in block script when doing guest saving. Was:Re: [Xen-devel] an issue with ''xm save''"): >> On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote: >>> After replace locking.sh with OVM''s, it worked. >>> But xen-tools evolved to use flock based locking currently. We can''t >>> revert back. >>> It seems changeset 25595:497e2fe49455 bring the issue. >>> Finally, I came with a small patch that workaround the issue. >> Ian, any thoughts on how this can be fixed properly? The locking >> is quite heavy and causes two scripts that do deal with blocks >> to spin for a long time. > The algorithm is not supposed to spin. If it spins there is a bug, I > think. > > Can you show me a transcript of the spinning happening with "set -x" > in force ?Hi Ian, Attachment is part of xen-hotplug.log that will show the spin. Original file is nearly 7G big. I fetch first 1000 lines. Below is steps I did: 1. delete xen-hotplug.log 2. reboot dom0 3. add "set -x" to head of block script. 4.create first PV guest 5. save guest 6. get first 1000 lines from xen-hotplug.log thanks zduan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Oct-02 14:59 UTC
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''
Zhenzhong Duan writes ("Re: [Xen-devel] Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''"):> Attachment is part of xen-hotplug.log that will show the spin. > Original file is nearly 7G big. I fetch first 1000 lines.Thanks. This is quite mystifying to me. Picking a representative example, and translating it to what I think the syscalls would be:> + eval ''exec 200>>/var/run/xen-hotplug/block'' > ++ execfd = open("/var/run/xen-hotplug/block", O_WRONLY|O_APPEND|O_CREAT, 0666); dup2(fd, 200); close(fd);> + flock -x 200flock(200, LOCK_EX);> ++ perl -e '' > open STDIN, "<&200" or die $!;dup2(200, 0);> my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;fstat(0, &stdin_stat);> my $file_inum = (stat $ARGV[0])[1];stat("/var/run/xen-hotplug/block", &file_stat);> print "y\n" if $fd_inum eq $file_inum;if (stdin_stat.st_ino == file_stat.st_ino) puts("y");> '' /var/run/xen-hotplug/block > + rightfileAnd here we see that perl didn''t print "y" so the two files must be different. Let''s try something else: can you strace it ? That is, get it to the point where it''s spinning, find the pid of the relevant shell process, and strace -f -vvs500 -ooutput.strace PID where PID is the pid in question ? Let that run for a fraction of a second and then ^C it. The output may shed some light. Thanks, Ian.
DuanZhenzhong
2012-Oct-09 08:53 UTC
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''
Ian Jackson wrote:> Zhenzhong Duan writes ("Re: [Xen-devel] Is: Contention in block script when doing guest saving. Was:Re: an issue with ''xm save''"): > >> Attachment is part of xen-hotplug.log that will show the spin. >> Original file is nearly 7G big. I fetch first 1000 lines. >> > > Thanks. This is quite mystifying to me. > > > Picking a representative example, and translating it to what I think > the syscalls would be: > > >> + eval ''exec 200>>/var/run/xen-hotplug/block'' >> ++ exec >> > > fd = open("/var/run/xen-hotplug/block", O_WRONLY|O_APPEND|O_CREAT, 0666); > dup2(fd, 200); > close(fd); > > >> + flock -x 200 >> > > flock(200, LOCK_EX); > > >> ++ perl -e '' >> open STDIN, "<&200" or die $!; >> > > dup2(200, 0); > > >> my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum; >> > > fstat(0, &stdin_stat); > > >> my $file_inum = (stat $ARGV[0])[1]; >> > > stat("/var/run/xen-hotplug/block", &file_stat); > > >> print "y\n" if $fd_inum eq $file_inum; >> > > if (stdin_stat.st_ino == file_stat.st_ino) > puts("y"); > > >> '' /var/run/xen-hotplug/block >> + rightfile>> > > And here we see that perl didn''t print "y" so the two files must be > different. > > > Let''s try something else: can you strace it ? That is, get it to the > point where it''s spinning, find the pid of the relevant shell process, > and > strace -f -vvs500 -ooutput.strace PID > where PID is the pid in question ? > > Let that run for a fraction of a second and then ^C it. > > The output may shed some light. > > Thanks, > Ian. >Hi Ian, Sorry for late, just back from vocation today. Your requested strace info attached, any info you need, let me know. -- Regards zhenzhong -- Oracle Building, No.24 Building, Zhongguancun Software Park Haidian District, Beijing 100193, China _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel