Hi, Some additional elements. Irrespective of the SCSI error reported earlier, I have established that Solaris dom0 hangs anyway when a domU is booted from a disk image located on an emulated ZFS volume. Has this been also observed by other members of the community? Is there a known explanation to this problem? What would be the troubleshooting steps? Thanks Patrick _______________________________________________ xen-discuss mailing list xen-discuss@opensolaris.org
Richard Lowe
2006-Aug-03 18:42 UTC
Re: [xen-discuss] dom0 hangs when using an emulated ZFS volume
Patrick Petit wrote:> Hi, > > Some additional elements. Irrespective of the SCSI error reported > earlier, I have established that Solaris dom0 hangs anyway when a domU > is booted from a disk image located on an emulated ZFS volume. Has this > been also observed by other members of the community? Is there a known > explanation to this problem? What would be the troubleshooting steps?The hang I see isn''t when booting on a zvol. I see hangs, intermittently when using a zvol for anything xen related. The first I saw was while making the proto on a zvol, the second was while creating a domU on a zvol (not booting it, just the vbdcfg). I''ve been utterly unable to get any useful information out of the machine at that point. I don''t drop to the debugger, I *can''t* drop to the debugger, and the machine doesn''t respond in any way (even to 3 C-a''s, though that maybe a problem on my end). As I''ve said elsewhere, I''m still trying to reproduce this in such a way I can get some kind of information about it (and failing). I''m not sure what you could do to troubleshoot it. Do you/can you get into kmdb when this happens? Does sending 3 Control-a''s to the console do anything? -- Rich
Richard Lowe wrote:> Patrick Petit wrote: >> Hi, >> >> Some additional elements. Irrespective of the SCSI error reported >> earlier, I have established that Solaris dom0 hangs anyway when a domU >> is booted from a disk image located on an emulated ZFS volume. Has >> this been also observed by other members of the community? Is there a >> known explanation to this problem? What would be the troubleshooting >> steps? > > The hang I see isn''t when booting on a zvol. I see hangs, > intermittently when using a zvol for anything xen related. > > The first I saw was while making the proto on a zvol, the second was > while creating a domU on a zvol (not booting it, just the vbdcfg). > > I''ve been utterly unable to get any useful information out of the > machine at that point. I don''t drop to the debugger, I *can''t* drop to > the debugger, and the machine doesn''t respond in any way (even to 3 > C-a''s, though that maybe a problem on my end). > > As I''ve said elsewhere, I''m still trying to reproduce this in such a way > I can get some kind of information about it (and failing). > > I''m not sure what you could do to troubleshoot it. > Do you/can you get into kmdb when this happens? Does sending 3 > Control-a''s to the console do anything? >I just filed: CR 6456891 "Xen dom0 wedges when making use of domU''s backed by zvols" Regarding this, If I can get more information about it, I''ll ask someone to update it for me. Sorry I didn''t get to that sooner, it slipped my mind amongst other things. -- Rich.
Darren J Moffat
2006-Aug-03 23:55 UTC
Re: [xen-discuss] dom0 hangs when using an emulated ZFS volume
Richard Lowe wrote:> Patrick Petit wrote: >> Hi, >> >> Some additional elements. Irrespective of the SCSI error reported >> earlier, I have established that Solaris dom0 hangs anyway when a domU >> is booted from a disk image located on an emulated ZFS volume. Has >> this been also observed by other members of the community? Is there a >> known explanation to this problem? What would be the troubleshooting >> steps? > > The hang I see isn''t when booting on a zvol. I see hangs, > intermittently when using a zvol for anything xen related. > > The first I saw was while making the proto on a zvol, the second was > while creating a domU on a zvol (not booting it, just the vbdcfg). > > I''ve been utterly unable to get any useful information out of the > machine at that point. I don''t drop to the debugger, I *can''t* drop to > the debugger, and the machine doesn''t respond in any way (even to 3 > C-a''s, though that maybe a problem on my end). > > As I''ve said elsewhere, I''m still trying to reproduce this in such a way > I can get some kind of information about it (and failing). > > I''m not sure what you could do to troubleshoot it. > Do you/can you get into kmdb when this happens? Does sending 3 > Control-a''s to the console do anything?Very interesting I given what I posted here yesterday. I was just using a ZFS file system to store the file backed VBD not a zvol. The symptoms seem very similar. Given that the Xen dom0 bits are snv_41 based I wonder if the ZFS issues are fixed in a later build. -- Darren J Moffat
Patrick Petit
2006-Aug-04 10:23 UTC
Re: [xen-discuss] dom0 hangs when using an emulated ZFS volume
Richard Lowe wrote:> Patrick Petit wrote: > >> Hi, >> >> Some additional elements. Irrespective of the SCSI error reported >> earlier, I have established that Solaris dom0 hangs anyway when a >> domU is booted from a disk image located on an emulated ZFS volume. >> Has this been also observed by other members of the community? Is >> there a known explanation to this problem? What would be the >> troubleshooting steps? > > > The hang I see isn''t when booting on a zvol. I see hangs, > intermittently when using a zvol for anything xen related.This is quite possible. The only interaction I have with zvol is when I boot domU since I import disk images outside of the vbdcfg process.> > The first I saw was while making the proto on a zvol, the second was > while creating a domU on a zvol (not booting it, just the vbdcfg). > > I''ve been utterly unable to get any useful information out of the > machine at that point. I don''t drop to the debugger, I *can''t* drop > to the debugger, and the machine doesn''t respond in any way (even to 3 > C-a''s, though that maybe a problem on my end). > > As I''ve said elsewhere, I''m still trying to reproduce this in such a > way I can get some kind of information about it (and failing). > > I''m not sure what you could do to troubleshoot it. > Do you/can you get into kmdb when this happens? Does sending 3 > Control-a''s to the console do anything?Can switch from Xen to Dom0 back and forth but no prompt. kmdb hangs too :-) Patrick> > -- Rich
Darren J Moffat wrote:> Richard Lowe wrote: > >> Patrick Petit wrote: >> >>> Hi, >>> >>> Some additional elements. Irrespective of the SCSI error reported >>> earlier, I have established that Solaris dom0 hangs anyway when a >>> domU is booted from a disk image located on an emulated ZFS volume. >>> Has this been also observed by other members of the community? Is >>> there a known explanation to this problem? What would be the >>> troubleshooting steps? >> >> >> The hang I see isn''t when booting on a zvol. I see hangs, >> intermittently when using a zvol for anything xen related. >> >> The first I saw was while making the proto on a zvol, the second was >> while creating a domU on a zvol (not booting it, just the vbdcfg). >> >> I''ve been utterly unable to get any useful information out of the >> machine at that point. I don''t drop to the debugger, I *can''t* drop >> to the debugger, and the machine doesn''t respond in any way (even to >> 3 C-a''s, though that maybe a problem on my end). >> >> As I''ve said elsewhere, I''m still trying to reproduce this in such a >> way I can get some kind of information about it (and failing). >> >> I''m not sure what you could do to troubleshoot it. >> Do you/can you get into kmdb when this happens? Does sending 3 >> Control-a''s to the console do anything? > > > Very interesting I given what I posted here yesterday. I was just > using a ZFS file system to store the file backed VBD not a zvol.File backed disk image on ZFS (like in disk = [ ''file:/etc/xen/images/roller.img'' ] ) works fine for me. So, it''s not everything zfs/xen related. # zfs listNAME USED AVAIL REFER MOUNTPOINT tank 27.6G 5.85G 24.5K /tank tank/fs1 12.4G 5.85G 12.4G /export/xc tank/fs2 10.2G 5.85G 10.2G /etc/xen/images tank/roller 5.04G 5.85G 5.04G -> The symptoms seem very similar. Given that the Xen dom0 bits are > snv_41 based I wonder if the ZFS issues are fixed in a later build. > >-- Patrick Petit Sun Microsystems Inc. Labs, CTO - G2 Systems Exp. ICNC Grenoble (http://icncweb.france) Phone: (+33)476 188 232 x38232 180, Avenue de l''Europe Fax: (+33)476 188 282 38334 Saint-Ismier Cedex, France
On 4 Aug 2006, at 11:23am, Patrick Petit wrote:>> Do you/can you get into kmdb when this happens? Does sending 3 >> Control-a''s to the console do anything? > > Can switch from Xen to Dom0 back and forth but no prompt. kmdb > hangs too :-)When you''re talking to Xen (using three control-A''s) you should hit ''q'', which causes the dom0 to drop into kmdb (three control-A''s to get back to the dom0 and hence kmdb). Does this not work? dme. -- David Edmondson, Solaris Engineering, http://www.dme.org
David Edmondson wrote:> > On 4 Aug 2006, at 11:23am, Patrick Petit wrote: > >>> Do you/can you get into kmdb when this happens? Does sending 3 >>> Control-a''s to the console do anything? >> >> >> Can switch from Xen to Dom0 back and forth but no prompt. kmdb hangs >> too :-) > > > When you''re talking to Xen (using three control-A''s) you should hit > ''q'', which causes the dom0 to drop into kmdb (three control-A''s to > get back to the dom0 and hence kmdb). Does this not work?I''ll try this. Note that I am seeing similar problems on vanilla Nevada snv-44 that is not related to Xen at all. For instance, the launching of build-workspace (build Xen from source) hangs very badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o cyrix.o generic.o main.o state.o''. You can''t kill any of the gmake or gld processes launched by the build! Even a reboot would not complete! To get out of this, you need to cycle the platform! So, there seem to be a more general hanging problem on this build that dom0 may inherit. Wouldn''t a similar hanging situation of the xend process produce the same effects as those described earlier? The fact that it happens when ZFS is involved is perhaps just the result of a side effect. Patrick> > dme.-- Patrick Petit Sun Microsystems Inc. Labs, CTO - G2 Systems Exp. ICNC Grenoble (http://icncweb.france) Phone: (+33)476 188 232 x38232 180, Avenue de l''Europe Fax: (+33)476 188 282 38334 Saint-Ismier Cedex, France
On 4 Aug 2006, at 1:22pm, Patrick Petit wrote:>> When you''re talking to Xen (using three control-A''s) you should >> hit ''q'', which causes the dom0 to drop into kmdb (three control- >> A''s to get back to the dom0 and hence kmdb). Does this not work? > > I''ll try this. Note that I am seeing similar problems on vanilla > Nevada snv-44 that is not related to Xen at all. For instance, the > launching of build-workspace (build Xen from source) hangs very > badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o > cyrix.o generic.o main.o state.o''. You can''t kill any of the gmake > or gld processes launched by the build! Even a reboot would not > complete! To get out of this, you need to cycle the platform! So, > there seem to be a more general hanging problem on this build that > dom0 may inherit. Wouldn''t a similar hanging situation of the xend > process produce the same effects as those described earlier? The > fact that it happens when ZFS is involved is perhaps just the > result of a side effect.Ahh, this sounds like something that was mentioned internally recently. See CR 6425723 and perhaps try:> the work around for this is to manually diable vpm_enable > > on a console of the machine, or direct monitor and keyboard > as root : > > # mdb -kw > # vpm_enable/D (to disaply what it is set at) > # vpm_enable/W 0 (to set it to 0) > > reboot machine and check if it has been set to 0(I know nothing about the background to this than the workaround.) dme. -- David Edmondson, Solaris Engineering, http://www.dme.org
David Edmondson wrote:> > On 4 Aug 2006, at 1:22pm, Patrick Petit wrote: > >>> When you''re talking to Xen (using three control-A''s) you should >>> hit ''q'', which causes the dom0 to drop into kmdb (three control- >>> A''s to get back to the dom0 and hence kmdb). Does this not work? >> >> >> I''ll try this. Note that I am seeing similar problems on vanilla >> Nevada snv-44 that is not related to Xen at all. For instance, the >> launching of build-workspace (build Xen from source) hangs very >> badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o cyrix.o >> generic.o main.o state.o''. You can''t kill any of the gmake or gld >> processes launched by the build! Even a reboot would not complete! >> To get out of this, you need to cycle the platform! So, there seem >> to be a more general hanging problem on this build that dom0 may >> inherit. Wouldn''t a similar hanging situation of the xend process >> produce the same effects as those described earlier? The fact that >> it happens when ZFS is involved is perhaps just the result of a side >> effect. > > > Ahh, this sounds like something that was mentioned internally > recently. See CR 6425723 and perhaps try: > >> the work around for this is to manually diable vpm_enable >> >> on a console of the machine, or direct monitor and keyboard >> as root : >> >> # mdb -kw >> # vpm_enable/D (to disaply what it is set at) >> # vpm_enable/W 0 (to set it to 0) >> >> reboot machine and check if it has been set to 0 >I have tried this and echo "set vpm_cache_enable=0" >> /etc/system. Doesn''t work. But is has solved the build process hang described above. That''s better than nothing. Thanks for the suggestion anyway. Patrick> (I know nothing about the background to this than the workaround.) > > dme.-- Patrick Petit Sun Microsystems Inc. Labs, CTO - G2 Systems Exp. ICNC Grenoble (http://icncweb.france) Phone: (+33)476 188 232 x38232 180, Avenue de l''Europe Fax: (+33)476 188 282 38334 Saint-Ismier Cedex, France
Patrick Petit wrote:> David Edmondson wrote: > >> On 4 Aug 2006, at 1:22pm, Patrick Petit wrote: >> >>>> When you''re talking to Xen (using three control-A''s) you should >>>> hit ''q'', which causes the dom0 to drop into kmdb (three control- >>>> A''s to get back to the dom0 and hence kmdb). Does this not work? >>> >>> >>> I''ll try this. Note that I am seeing similar problems on vanilla >>> Nevada snv-44 that is not related to Xen at all. For instance, the >>> launching of build-workspace (build Xen from source) hangs very >>> badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o cyrix.o >>> generic.o main.o state.o''. You can''t kill any of the gmake or gld >>> processes launched by the build! Even a reboot would not complete! >>> To get out of this, you need to cycle the platform! So, there seem >>> to be a more general hanging problem on this build that dom0 may >>> inherit. Wouldn''t a similar hanging situation of the xend process >>> produce the same effects as those described earlier? The fact that >>> it happens when ZFS is involved is perhaps just the result of a side >>> effect. >> >> Ahh, this sounds like something that was mentioned internally >> recently. See CR 6425723 and perhaps try: >> >>> the work around for this is to manually diable vpm_enable >>> >>> on a console of the machine, or direct monitor and keyboard >>> as root : >>> >>> # mdb -kw >>> # vpm_enable/D (to disaply what it is set at) >>> # vpm_enable/W 0 (to set it to 0) >>> >>> reboot machine and check if it has been set to 0 >> > > I have tried this and echo "set vpm_cache_enable=0" >> /etc/system.What you did: # echo "set vpm_cache_enable=0" >> /etc/system Is the correct workaround for the build problem.. It''s a generic opensolaris bug.> Doesn''t work. But is has solved the build process hang described above. > That''s better than nothing. > Thanks for the suggestion anyway.I know John D and Max had done some testing with zfs. I''ll ping them and see if they can add some comments... MRJ
John Danielson
2006-Aug-04 18:45 UTC
[xen-discuss] dom0 hangs when using an emulated ZFS volume
.> >Patrick Petit wrote: >> David Edmondson wrote: >> >>> On 4 Aug 2006, at 1:22pm, Patrick Petit wrote: >>> >>>>> When you''re talking to Xen (using three control-A''s) you should >>>>> hit ''q'', which causes the dom0 to drop into kmdb (three control- >>>>> A''s to get back to the dom0 and hence kmdb). Does this not work? >>>> >>>> >>>> I''ll try this. Note that I am seeing similar problems on vanilla >>>> Nevada snv-44 that is not related to Xen at all. For instance, the >>>> launching of build-workspace (build Xen from source) hangs very >>>> badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o cyrix.o >>>> generic.o main.o state.o''. You can''t kill any of the gmake or gld >>>> processes launched by the build! Even a reboot would not complete! >>>> To get out of this, you need to cycle the platform! So, there seem >>>> to be a more general hanging problem on this build that dom0 may >>>> inherit. Wouldn''t a similar hanging situation of the xend process >>>> produce the same effects as those described earlier? The fact that >>>> it happens when ZFS is involved is perhaps just the result of a side >>>> effect. >>> >>> Ahh, this sounds like something that was mentioned internally >>> recently. See CR 6425723 and perhaps try: >>> >>>> the work around for this is to manually diable vpm_enable >>>> >>>> on a console of the machine, or direct monitor and keyboard >>>> as root : >>>> >>>> # mdb -kw >>>> # vpm_enable/D (to disaply what it is set at) >>>> # vpm_enable/W 0 (to set it to 0) >>>> >>>> reboot machine and check if it has been set to 0 >>> >> >> I have tried this and echo "set vpm_cache_enable=0" >> /etc/system. > >What you did: ># echo "set vpm_cache_enable=0" >> /etc/system > >Is the correct workaround for the build problem.. It''s a generic >opensolaris bug. > > > > >> Doesn''t work. But is has solved the build process hang described above. >> That''s better than nothing. >> Thanks for the suggestion anyway. > >I know John D and Max had done some testing with zfs. I''ll ping them >and see if they can add some comments...We''ve tested the use of zvol''s as device-backed VBD''s but not extensively. A couple of things to point out. When creating the zvol, set the blocksize to 512 e.g. zfs create -b 512 -V 5g zpool/myzdev You might also try increasing the amount of memory given to dom0 when using zfs for backend devices. I''ve experienced very slow response times when using zfs with the default Solaris dom0 memory settings. - john
John Danielson wrote:>. > > >>Patrick Petit wrote: >> >> >>>David Edmondson wrote: >>> >>> >>> >>>>On 4 Aug 2006, at 1:22pm, Patrick Petit wrote: >>>> >>>> >>>> >>>>>>When you''re talking to Xen (using three control-A''s) you should >>>>>>hit ''q'', which causes the dom0 to drop into kmdb (three control- >>>>>>A''s to get back to the dom0 and hence kmdb). Does this not work? >>>>>> >>>>>> >>>>>I''ll try this. Note that I am seeing similar problems on vanilla >>>>>Nevada snv-44 that is not related to Xen at all. For instance, the >>>>>launching of build-workspace (build Xen from source) hangs very >>>>>badly in ''gld -m elf_i386 -r -o built_in.o amd.o centaur.o cyrix.o >>>>>generic.o main.o state.o''. You can''t kill any of the gmake or gld >>>>>processes launched by the build! Even a reboot would not complete! >>>>>To get out of this, you need to cycle the platform! So, there seem >>>>>to be a more general hanging problem on this build that dom0 may >>>>>inherit. Wouldn''t a similar hanging situation of the xend process >>>>>produce the same effects as those described earlier? The fact that >>>>>it happens when ZFS is involved is perhaps just the result of a side >>>>>effect. >>>>> >>>>> >>>>Ahh, this sounds like something that was mentioned internally >>>>recently. See CR 6425723 and perhaps try: >>>> >>>> >>>> >>>>>the work around for this is to manually diable vpm_enable >>>>> >>>>>on a console of the machine, or direct monitor and keyboard >>>>>as root : >>>>> >>>>># mdb -kw >>>>># vpm_enable/D (to disaply what it is set at) >>>>># vpm_enable/W 0 (to set it to 0) >>>>> >>>>>reboot machine and check if it has been set to 0 >>>>> >>>>> >>>I have tried this and echo "set vpm_cache_enable=0" >> /etc/system. >>> >>> >>What you did: >># echo "set vpm_cache_enable=0" >> /etc/system >> >>Is the correct workaround for the build problem.. It''s a generic >>opensolaris bug. >> >> >> >> >> >> >>>Doesn''t work. But is has solved the build process hang described above. >>>That''s better than nothing. >>>Thanks for the suggestion anyway. >>> >>> >>I know John D and Max had done some testing with zfs. I''ll ping them >>and see if they can add some comments... >> >> > >We''ve tested the use of zvol''s as device-backed VBD''s but not extensively. >A couple of things to point out. > >When creating the zvol, set the blocksize to 512 > > e.g. zfs create -b 512 -V 5g zpool/myzdev > >You might also try increasing the amount of memory given to dom0 when >using zfs for backend devices. I''ve experienced very slow response times >when using zfs with the default Solaris dom0 memory settings. > >Hi John I have tried this and increased dom0''s memory to 1GB as well. Still hangs during domU''s boot process at the moment the virtual disk (mapped by zvol) is registered in the kernel. I filled CR 6458021 <http://bt2ws.central.sun.com/CrPrint?id=6458021> yesterday. Best regards, - Patrick>- john > > > >