Jérôme Petazzoni
2004-Nov-30 13:45 UTC
[Xen-devel] More Xen troubles (with xend this time)
When trying to reproduce a crash (when I do "xm restore foo.xen", the restored VM crashed instantaneously), I hit the following "bug" (I hope that the problem lies between my keyboard and my chair and that I didn''t find another real bug) : I did restore the domain, then noticed it was crashed ("-----c" in xm list). I tried to destroy it, but it didn''t work. So I stopped xend, restarted it ... And there it goes : isnpro:~# xm list (111, ''Connection refused'') Error: Error connecting to xend, is xend running? isnpro:~# xend start isnpro:~# xm list (111, ''Connection refused'') Error: Error connecting to xend, is xend running? isnpro:~# ps aux | grep x root 644 0.0 0.0 0 0 ? S Nov29 0:02 [xenblkd] root 14003 0.0 1.8 4308 1112 ? S 14:37 0:00 xfrd isnpro:~# lsof | grep LISTEN portmap 970 daemon 4u IPv4 2122 TCP *:sunrpc (LISTEN) exim4 1113 Debian-exim 0u IPv4 2291 TCP localhost:smtp (LISTEN) inetd 1119 root 4u IPv4 2304 TCP *:discard (LISTEN) inetd 1119 root 6u IPv4 2306 TCP *:daytime (LISTEN) inetd 1119 root 7u IPv4 2307 TCP *:time (LISTEN) sshd 1129 root 3u IPv4 2329 TCP *:ssh (LISTEN) rpc.statd 1135 root 6u IPv4 2369 TCP *:893 (LISTEN) xfrd 14003 root 2u IPv4 274713 TCP *:8002 (LISTEN) isnpro:~# tail /var/log/xend.log [2004-11-30 14:17:00 xend] INFO (XendRoot:91) EVENT> xend.domain.exit [''xipetotec'', ''22'', ''crash''] [2004-11-30 14:17:00 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy [''xipetotec'', ''22''] [2004-11-30 14:17:28 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:18:35 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:19:57 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:23:45 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:25:46 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:31:43 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:31:54 xend] INFO (SrvDaemon:607) Xend Daemon started [2004-11-30 14:37:40 xend] INFO (SrvDaemon:607) Xend Daemon started (many "Xend Daemon started" messages since I tried many times to restart it...) So I thought that xfrd (xend?) was running on port 8002 instead of 8000, and I tried to setup a redir (who knows!) : isnpro:~# redir --cport 8002 --lport 8000 & [1] 14025 isnpro:~# xm list Traceback (most recent call last): File "/usr/sbin/xm", line 9, in ? main.main(sys.argv) File "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", line 795, in main File "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", line 106, in main File "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", line 124, in main_call File "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", line 343, in main AttributeError: ''str'' object has no attribute ''sort'' Okay, it seems it wasn''t a very clever idea after all. What should I try now ? (I don''t want to reboot the thing yet, since the virtual domains are still running and I can''t stop them right now). ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > When trying to reproduce a crash (when I do "xm restore foo.xen", the > restored VM crashed instantaneously), I hit the following "bug" (I hope > that the problem lies between my keyboard and my chair and that I didn''t > find another real bug) : > > I did restore the domain, then noticed it was crashed ("-----c" in xm > list). I tried to destroy it, but it didn''t work. So I stopped xend, > restarted it ... And there it goes :Restarting xend was a bit of a brave move. Although its intended to be restartable, there are definitely bugs in the restart code, particularly if you have domains in unusual states at the time (e.g. crashed, awaiting reaping). Error case code is always the least tested... As to why the domain wouldn''t be destroyed, that may turn out to be a useful clue as to the resume problem at its root. If we could get a crashdump from a known kernel image I''m sure we can fix this pretty easily. Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Unfortunately when a domain crashes it seems to confuse xend such that it isn''t even restartable. So then you need to reboot the machine. Clearly this needs looking into. :-) -- Keir> > When trying to reproduce a crash (when I do "xm restore foo.xen", the > restored VM crashed instantaneously), I hit the following "bug" (I hope > that the problem lies between my keyboard and my chair and that I didn''t > find another real bug) : > > I did restore the domain, then noticed it was crashed ("-----c" in xm > list). I tried to destroy it, but it didn''t work. So I stopped xend, > restarted it ... And there it goes : > > isnpro:~# xm list > (111, ''Connection refused'') > Error: Error connecting to xend, is xend running? > > isnpro:~# xend start > > isnpro:~# xm list > (111, ''Connection refused'') > Error: Error connecting to xend, is xend running? > > isnpro:~# ps aux | grep x > root 644 0.0 0.0 0 0 ? S Nov29 0:02 [xenblkd] > root 14003 0.0 1.8 4308 1112 ? S 14:37 0:00 xfrd > > isnpro:~# lsof | grep LISTEN > portmap 970 daemon 4u IPv4 2122 TCP > *:sunrpc (LISTEN) > exim4 1113 Debian-exim 0u IPv4 2291 TCP > localhost:smtp (LISTEN) > inetd 1119 root 4u IPv4 2304 TCP > *:discard (LISTEN) > inetd 1119 root 6u IPv4 2306 TCP > *:daytime (LISTEN) > inetd 1119 root 7u IPv4 2307 TCP > *:time (LISTEN) > sshd 1129 root 3u IPv4 2329 TCP > *:ssh (LISTEN) > rpc.statd 1135 root 6u IPv4 2369 TCP > *:893 (LISTEN) > xfrd 14003 root 2u IPv4 274713 TCP > *:8002 (LISTEN) > > isnpro:~# tail /var/log/xend.log > [2004-11-30 14:17:00 xend] INFO (XendRoot:91) EVENT> xend.domain.exit > [''xipetotec'', ''22'', ''crash''] > [2004-11-30 14:17:00 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy > [''xipetotec'', ''22''] > [2004-11-30 14:17:28 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:18:35 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:19:57 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:23:45 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:25:46 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:31:43 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:31:54 xend] INFO (SrvDaemon:607) Xend Daemon started > [2004-11-30 14:37:40 xend] INFO (SrvDaemon:607) Xend Daemon started > > (many "Xend Daemon started" messages since I tried many times to restart > it...) > > So I thought that xfrd (xend?) was running on port 8002 instead of 8000, > and I tried to setup a redir (who knows!) : > > isnpro:~# redir --cport 8002 --lport 8000 & > [1] 14025 > isnpro:~# xm list > Traceback (most recent call last): > File "/usr/sbin/xm", line 9, in ? > main.main(sys.argv) > File > "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", > line 795, in main > File > "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", > line 106, in main > File > "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", > line 124, in main_call > File > "/root/Xen/xen-2.0-testing.bk/dist/install/lib/python/xen/xm/main.py", > line 343, in main > AttributeError: ''str'' object has no attribute ''sort'' > > Okay, it seems it wasn''t a very clever idea after all. > > What should I try now ? (I don''t want to reboot the thing yet, since the > virtual domains are still running and I can''t stop them right now). > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jérôme Petazzoni
2004-Nov-30 16:45 UTC
Re: [Xen-devel] More Xen troubles (with xend this time)
>>I did restore the domain, then noticed it was crashed ("-----c" in xm >>list). I tried to destroy it, but it didn''t work. So I stopped xend, >>restarted it ... And there it goes : >> >> > >Restarting xend was a bit of a brave move. Although its intended >to be restartable, there are definitely bugs in the restart code, >particularly if you have domains in unusual states at the time > >Okay, and once we "lost" xend, there is no way to do anything ? (create, destroy, shutdown, get a console...)>As to why the domain wouldn''t be destroyed, that may turn out to >be a useful clue as to the resume problem at its root. If we >could get a crashdump from a known kernel image I''m sure we can >fix this pretty easily. > >I will do my best to reboot the box and try to reproduce the problem. But if the restore operation crashes xend, I will never be able to get a crash dump - unless there''s another way to get the information ? If I don''t use save/restore/migrate, can I expect Xen to be stable ? Or are there other features that are deemed to cause problems ? ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Christian Limpach
2004-Nov-30 16:57 UTC
Re: [Xen-devel] More Xen troubles (with xend this time)
On Tue, Nov 30, 2004 at 05:45:29PM +0100, Jérôme Petazzoni wrote:> >Restarting xend was a bit of a brave move. Although its intended > >to be restartable, there are definitely bugs in the restart code, > >particularly if you have domains in unusual states at the time > > > > > Okay, and once we "lost" xend, there is no way to do anything ? (create, > destroy, shutdown, get a console...)You could try removing xend''s database in /var/xen/xend-db -- you will lose your domain''s names but I''ve found that sometimes this allows restarting xend when it''s in the state where it doesn''t want to start anymore. christian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > >>I did restore the domain, then noticed it was crashed ("-----c" in xm > >>list). I tried to destroy it, but it didn''t work. So I stopped xend, > >>restarted it ... And there it goes : > >> > >> > > > >Restarting xend was a bit of a brave move. Although its intended > >to be restartable, there are definitely bugs in the restart code, > >particularly if you have domains in unusual states at the time > > Okay, and once we "lost" xend, there is no way to do anything ? (create, > destroy, shutdown, get a console...)If you can''t restart it, your stuffed.> >As to why the domain wouldn''t be destroyed, that may turn out to > >be a useful clue as to the resume problem at its root. If we > >could get a crashdump from a known kernel image I''m sure we can > >fix this pretty easily. > > > I will do my best to reboot the box and try to reproduce the problem. > But if the restore operation crashes xend, I will never be able to get a > crash dump - unless there''s another way to get the information ?Put ''-c'' on the restore command line to connect to the console.> If I don''t use save/restore/migrate, can I expect Xen to be > stable ?It''s stable for most people even using save/restore/migrate. It must be something about your particular setup or configuration which is provoking the bug. We''ve actually put a fair amount of effort into trying to reproduce the resume crash, but haven''t managed it.> Or are there other features that are deemed to cause problems ?The list of known bugs is currently remarkably short, and mostly related to certain hardware (drivers or ioapics). Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jérôme Petazzoni
2004-Nov-30 17:10 UTC
Re: [Xen-devel] More Xen troubles (with xend this time)
>>Okay, and once we "lost" xend, there is no way to do anything ? (create, >>destroy, shutdown, get a console...) >> >> >You could try removing xend''s database in /var/xen/xend-db -- you will >lose your domain''s names but I''ve found that sometimes this allows >restarting xend when it''s in the state where it doesn''t want to start >anymore. > >Interesting ! I did that, and then : # xm list Name Id Mem(MB) CPU State Time(s) Console Domain-0 0 59 0 r---- 5915.7 Domain-16 16 63 0 -b--- 328.7 Domain-17 17 127 0 -b--- 699.8 Domain-21 21 0 0 ----c 0.0 Domain-22 22 0 0 ----c 0.0 I could successfully create a new domain attached to a console, and stop it ; but other stuff didn''t work : # xm console 16 Error: No console information And xm shutdown didn''t do anything, it seems. Destroying crashed domains didn''t work either. It keeps spitting those messages every couple of seconds in xend.log : [2004-11-30 18:06:44 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died name=Domain-21 id=21 [2004-11-30 18:06:44 xend] INFO (XendDomain:564) Destroying domain: name=Domain-21 [2004-11-30 18:06:44 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died name=Domain-22 id=22 [2004-11-30 18:06:44 xend] INFO (XendDomain:564) Destroying domain: name=Domain-22 [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.exit [''Domain-21'', ''21'', ''crash''] [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy [''Domain-21'', ''21''] [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.exit [''Domain-22'', ''22'', ''crash''] [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy [''Domain-22'', ''22''] xend-debug.log and xm dmesg are silent. Anything useful I can try to get more information about this before I reboot the beast ? :-) ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jérôme Petazzoni
2004-Nov-30 17:14 UTC
Re: [Xen-devel] More Xen troubles (with xend this time)
>>But if the restore operation crashes xend, I will never be able to get a >>crash dump - unless there''s another way to get the information ? >> >> > >Put ''-c'' on the restore command line to connect to the console. > >I tried that, but it just dumps the domain config in SXP format, and then stops (see below).>>If I don''t use save/restore/migrate, can I expect Xen to be >>stable ? >> >> >It''s stable for most people even using save/restore/migrate. It >must be something about your particular setup or configuration >which is provoking the bug. We''ve actually put a fair amount of >effort into trying to reproduce the resume crash, but haven''t >managed it. > >I must have a "NOLUCK" flag :-) we tried Xen on two different P4 boxes, on two different Celeron boxes, and we had different problems each time. Would some bootflags be of any use ? Do some chipsets have a bad reputation, or something like that ? Here''s what happens when I attempt a restore : isnpro:~# xm restore xipetotec.xen -c (domain (id 24) (name xipetotec) (memory 0) (maxmem 65536) (state ----c) (cpu 0) (cpu_time 0.045590639) (up_time 3.83165788651) (start_time 1101834496.66) (console (status closed) (id 58) (domain 24) (local_port 0) (remote_port 0) (console_port 9624) ) (devices) (config (vm (name xipetotec) (memory 64) (restart onreboot) (image (linux (kernel /boot/vmlinuz-2.6.9-xenU) (root ''/dev/hda1 ro'')) ) (device (vbd (uname phy:isnpro/xipetotec-root) (dev hda1) (mode w))) (device (vif (mac aa:00:00:00:00:09) (bridge xen-br0))) ) ) ) isnpro:~# xm list Name Id Mem(MB) CPU State Time(s) Console Domain-0 0 59 0 r---- 5987.9 Domain-16 16 63 0 -b--- 329.2 Domain-17 17 127 0 -b--- 700.0 Domain-21 21 0 0 ----c 0.0 Domain-22 22 0 0 ----c 0.0 xipetotec 24 0 0 ----c 0.0 9624 isnpro:~# ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jérôme Petazzoni wrote:> >>> Okay, and once we "lost" xend, there is no way to do anything ? >>> (create, destroy, shutdown, get a console...) >>> >> You could try removing xend''s database in /var/xen/xend-db -- you will >> lose your domain''s names but I''ve found that sometimes this allows >> restarting xend when it''s in the state where it doesn''t want to start >> anymore. >> > Interesting ! I did that, and then : > > # xm list > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 59 0 r---- 5915.7 > Domain-16 16 63 0 -b--- 328.7 > Domain-17 17 127 0 -b--- 699.8 > Domain-21 21 0 0 ----c 0.0 > Domain-22 22 0 0 ----c 0.0 > I could successfully create a new domain attached to a console, and stop > it ; but other stuff didn''t work : > > # xm console 16 > Error: No console informationThe problem is that if you remove xend-db xend no longer knows anything about the running domains (like what inter-domain ports their consoles are on, what their names are, what devices they have). So xend can no longer shut them down properly because it doesn''t know what devices to release. This is why you get all the errors.> And xm shutdown didn''t do anything, it seems. Destroying crashed domains > didn''t work either. It keeps spitting those messages every couple of > seconds in xend.log : > > [2004-11-30 18:06:44 xend] DEBUG (XendDomain:244) XendDomain>reap> > domain died name=Domain-21 id=21 > [2004-11-30 18:06:44 xend] INFO (XendDomain:564) Destroying domain: > name=Domain-21 > [2004-11-30 18:06:44 xend] DEBUG (XendDomain:244) XendDomain>reap> > domain died name=Domain-22 id=22 > [2004-11-30 18:06:44 xend] INFO (XendDomain:564) Destroying domain: > name=Domain-22 > [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.exit > [''Domain-21'', ''21'', ''crash''] > [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy > [''Domain-21'', ''21''] > [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.exit > [''Domain-22'', ''22'', ''crash''] > [2004-11-30 18:06:44 xend] INFO (XendRoot:91) EVENT> xend.domain.destroy > [''Domain-22'', ''22'']Xend is trying to get rid of the domains, but because their devices aren''t freed (because you removed the info about them) the domains won''t go away. So xend keeps detecting crashed domains and trying to get rid of them. Not the nicest behaviour - crashes don''t happen very much so this path is not well-explored.> xend-debug.log and xm dmesg are silent. > > Anything useful I can try to get more information about this before I > reboot the beast ? :-)At this point you''re pretty stuck. Time for a reboot. Mike ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel