On 02/20/2017 09:10 PM, Michal Privoznik wrote:> On 17.02.2017 17:18, Yunchih Chen wrote: >> `virsh list` hangs on my server that hosts a bunch of VMs. >> This might be due to the Debian upgrade I did on Feb 15, which upgrades >> `libvirt` from 2.4.0-1 to 3.0.0-2. >> I have tried restarting libvirtd for a few times, without luck. >> >> Attached below are some relevant logs; let me know if you need some more >> for debugging. >> Thanks for your help!! >> >> root@vm-host:~# uname -a >> Linux vm-host 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 >> GNU/Linux >> >> root@vm-host:~# apt-cache policy libvirt-daemon >> libvirt-daemon: >> Installed: 3.0.0-2 >> Candidate: 3.0.0-2 >> Version table: >> *** 3.0.0-2 500 >> 500 http://debian.csie.ntu.edu.tw/debian testing/main amd64 >> Packages >> 100 /var/lib/dpkg/status >> >> root@vm-host:~# strace -o /tmp/trace -e trace=network,file,poll virsh >> list # hangs forever ..... >> ^C >> root@vm-host:~# tail -10 /tmp/trace >> access("/etc/libvirt/libvirt.conf", F_OK) = 0 >> open("/etc/libvirt/libvirt.conf", O_RDONLY) = 5 >> access("/proc/vz", F_OK) = -1 ENOENT (No such file or >> directory) >> socket(AF_UNIX, SOCK_STREAM, 0) = 5 >> connect(5, {sa_family=AF_UNIX, >> sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 >> getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 >> poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 >> ([{fd=5, revents=POLLOUT}]) >> poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = ? >> ERESTART_RESTARTBLOCK (Interrupted by signal) >> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- >> +++ killed by SIGINT +++ >> >> root@vm-host:~# lsof /var/run/libvirt/libvirt-sock # hangs too ... > This is very suspicious. Looks like the daemon is in some weird state > and hence virsh is unable to get list of domains. > > # ps axf | grep libvirtd > # gdb -p $(pgrep libvirtd) > (gdb) t a a bt > > if you could run those commands and share the output that might shed > more light. > > MichalUnfortunately, gdb also hangs when attaching to libvirt .... root@vm-host:~# gdb -q -p $(pgrep libvirtd) Attaching to process 9556 [New LWP 9557] [New LWP 9558] [New LWP 9559] [New LWP 9560] [New LWP 9561] [New LWP 9562] [New LWP 9563] [New LWP 9564] [New LWP 9565] [New LWP 9566] [New LWP 9567] [New LWP 9568] [New LWP 9569] [New LWP 9570] [New LWP 9571] [New LWP 9572] ^C^C^C^C^C^C^C^C^C^C^C It must be killed with SIGKILL, and libvirtd will die with gdb. Here[1] is the output of the following command: strace -o /tmp/gdb-full-trace.txt -s 1024 -f gdb -q -p $(pgrep libvirtd) Thanks for your help : ) [1] https://www.csie.ntu.edu.tw/~yunchih/s/gdb-full-trace.txt -- -- Yun-Chih Chen 陳耘志 Network/Workstation Assistant Dept. of Computer Science and Information Engineering National Taiwan University Tel: +886-2-33664888 ext. 217/204 Email: ta217@csie.ntu.edu.tw Website: http://wslab.csie.ntu.edu.tw/
On 02/21/2017 12:07 PM, Yunchih Chen wrote:> On 02/20/2017 09:10 PM, Michal Privoznik wrote: >> On 17.02.2017 17:18, Yunchih Chen wrote: >>> `virsh list` hangs on my server that hosts a bunch of VMs. >>> This might be due to the Debian upgrade I did on Feb 15, which upgrades >>> `libvirt` from 2.4.0-1 to 3.0.0-2. >>> I have tried restarting libvirtd for a few times, without luck. >>> >>> Attached below are some relevant logs; let me know if you need some >>> more >>> for debugging. >>> Thanks for your help!! >>> >>> root@vm-host:~# uname -a >>> Linux vm-host 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 >>> GNU/Linux >>> >>> root@vm-host:~# apt-cache policy libvirt-daemon >>> libvirt-daemon: >>> Installed: 3.0.0-2 >>> Candidate: 3.0.0-2 >>> Version table: >>> *** 3.0.0-2 500 >>> 500 http://debian.csie.ntu.edu.tw/debian testing/main amd64 >>> Packages >>> 100 /var/lib/dpkg/status >>> >>> root@vm-host:~# strace -o /tmp/trace -e trace=network,file,poll virsh >>> list # hangs forever ..... >>> ^C >>> root@vm-host:~# tail -10 /tmp/trace >>> access("/etc/libvirt/libvirt.conf", F_OK) = 0 >>> open("/etc/libvirt/libvirt.conf", O_RDONLY) = 5 >>> access("/proc/vz", F_OK) = -1 ENOENT (No such file or >>> directory) >>> socket(AF_UNIX, SOCK_STREAM, 0) = 5 >>> connect(5, {sa_family=AF_UNIX, >>> sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 >>> getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 >>> poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 >>> ([{fd=5, revents=POLLOUT}]) >>> poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = ? >>> ERESTART_RESTARTBLOCK (Interrupted by signal) >>> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- >>> +++ killed by SIGINT +++ >>> >>> root@vm-host:~# lsof /var/run/libvirt/libvirt-sock # hangs too ... >> This is very suspicious. Looks like the daemon is in some weird state >> and hence virsh is unable to get list of domains. >> >> # ps axf | grep libvirtd >> # gdb -p $(pgrep libvirtd) >> (gdb) t a a bt >> >> if you could run those commands and share the output that might shed >> more light. >> >> Michal > > Unfortunately, gdb also hangs when attaching to libvirt .... > > root@vm-host:~# gdb -q -p $(pgrep libvirtd) > Attaching to process 9556 > [New LWP 9557] > [New LWP 9558] > [New LWP 9559] > [New LWP 9560] > [New LWP 9561] > [New LWP 9562] > [New LWP 9563] > [New LWP 9564] > [New LWP 9565] > [New LWP 9566] > [New LWP 9567] > [New LWP 9568] > [New LWP 9569] > [New LWP 9570] > [New LWP 9571] > [New LWP 9572] > ^C^C^C^C^C^C^C^C^C^C^C > > It must be killed with SIGKILL, and libvirtd will die with gdb. > > Here[1] is the output of the following command: > > strace -o /tmp/gdb-full-trace.txt -s 1024 -f gdb -q -p $(pgrep libvirtd) > > Thanks for your help : ) > > [1] https://www.csie.ntu.edu.tw/~yunchih/s/gdb-full-trace.txt >Any update on this? What other debug info could I provide? I just enable libvirtd's debug option and here[1] is the output. [1] https://www.csie.ntu.edu.tw/~yunchih/s/libvirtd-debug.log -- -- Yun-Chih Chen 陳耘志 Network/Workstation Assistant Dept. of Computer Science and Information Engineering National Taiwan University Tel: +886-2-33664888 ext. 217/204 Email: ta217@csie.ntu.edu.tw Website: http://wslab.csie.ntu.edu.tw/
On Tue, Feb 21, 2017 at 12:07:41PM +0800, Yunchih Chen wrote:> On 02/20/2017 09:10 PM, Michal Privoznik wrote: > > On 17.02.2017 17:18, Yunchih Chen wrote: > > > `virsh list` hangs on my server that hosts a bunch of VMs. > > > This might be due to the Debian upgrade I did on Feb 15, which upgrades > > > `libvirt` from 2.4.0-1 to 3.0.0-2. > > > I have tried restarting libvirtd for a few times, without luck. > > > > > > Attached below are some relevant logs; let me know if you need some more > > > for debugging. > > > Thanks for your help!! > > > > > > root@vm-host:~# uname -a > > > Linux vm-host 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 > > > GNU/Linux > > > > > > root@vm-host:~# apt-cache policy libvirt-daemon > > > libvirt-daemon: > > > Installed: 3.0.0-2 > > > Candidate: 3.0.0-2 > > > Version table: > > > *** 3.0.0-2 500 > > > 500 http://debian.csie.ntu.edu.tw/debian testing/main amd64 > > > Packages > > > 100 /var/lib/dpkg/status > > > > > > root@vm-host:~# strace -o /tmp/trace -e trace=network,file,poll virsh > > > list # hangs forever ..... > > > ^C > > > root@vm-host:~# tail -10 /tmp/trace > > > access("/etc/libvirt/libvirt.conf", F_OK) = 0 > > > open("/etc/libvirt/libvirt.conf", O_RDONLY) = 5 > > > access("/proc/vz", F_OK) = -1 ENOENT (No such file or > > > directory) > > > socket(AF_UNIX, SOCK_STREAM, 0) = 5 > > > connect(5, {sa_family=AF_UNIX, > > > sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 > > > getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 > > > poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 > > > ([{fd=5, revents=POLLOUT}]) > > > poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = ? > > > ERESTART_RESTARTBLOCK (Interrupted by signal) > > > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- > > > +++ killed by SIGINT +++ > > > > > > root@vm-host:~# lsof /var/run/libvirt/libvirt-sock # hangs too ... > > This is very suspicious. Looks like the daemon is in some weird state > > and hence virsh is unable to get list of domains. > > > > # ps axf | grep libvirtd > > # gdb -p $(pgrep libvirtd) > > (gdb) t a a bt > > > > if you could run those commands and share the output that might shed > > more light. > > > > Michal > > Unfortunately, gdb also hangs when attaching to libvirt .... > > root@vm-host:~# gdb -q -p $(pgrep libvirtd) > Attaching to process 9556 > [New LWP 9557] > [New LWP 9558] > [New LWP 9559] > [New LWP 9560] > [New LWP 9561] > [New LWP 9562] > [New LWP 9563] > [New LWP 9564] > [New LWP 9565] > [New LWP 9566] > [New LWP 9567] > [New LWP 9568] > [New LWP 9569] > [New LWP 9570] > [New LWP 9571] > [New LWP 9572] > ^C^C^C^C^C^C^C^C^C^C^C > > It must be killed with SIGKILL, and libvirtd will die with gdb. > > Here[1] is the output of the following command: > > strace -o /tmp/gdb-full-trace.txt -s 1024 -f gdb -q -p $(pgrep libvirtd)Err, that is useless - you've just straced GDB, not libvirtd You need to strace libvirtd as it starts strace -o libvirt.log -f -s 1000 /usr/sbin/libvirtd Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
On Fri, Feb 24, 2017 at 11:24:16PM +0800, Yunchih Chen wrote:> On 02/21/2017 12:07 PM, Yunchih Chen wrote: > > On 02/20/2017 09:10 PM, Michal Privoznik wrote: > > > On 17.02.2017 17:18, Yunchih Chen wrote: > > > > `virsh list` hangs on my server that hosts a bunch of VMs. > > > > This might be due to the Debian upgrade I did on Feb 15, which upgrades > > > > `libvirt` from 2.4.0-1 to 3.0.0-2. > > > > I have tried restarting libvirtd for a few times, without luck. > > > > > > > > Attached below are some relevant logs; let me know if you need > > > > some more > > > > for debugging. > > > > Thanks for your help!! > > > > > > > > root@vm-host:~# uname -a > > > > Linux vm-host 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 > > > > GNU/Linux > > > > > > > > root@vm-host:~# apt-cache policy libvirt-daemon > > > > libvirt-daemon: > > > > Installed: 3.0.0-2 > > > > Candidate: 3.0.0-2 > > > > Version table: > > > > *** 3.0.0-2 500 > > > > 500 http://debian.csie.ntu.edu.tw/debian testing/main amd64 > > > > Packages > > > > 100 /var/lib/dpkg/status > > > > > > > > root@vm-host:~# strace -o /tmp/trace -e trace=network,file,poll virsh > > > > list # hangs forever ..... > > > > ^C > > > > root@vm-host:~# tail -10 /tmp/trace > > > > access("/etc/libvirt/libvirt.conf", F_OK) = 0 > > > > open("/etc/libvirt/libvirt.conf", O_RDONLY) = 5 > > > > access("/proc/vz", F_OK) = -1 ENOENT (No such file or > > > > directory) > > > > socket(AF_UNIX, SOCK_STREAM, 0) = 5 > > > > connect(5, {sa_family=AF_UNIX, > > > > sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 > > > > getsockname(5, {sa_family=AF_UNIX}, [128->2]) = 0 > > > > poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 1 > > > > ([{fd=5, revents=POLLOUT}]) > > > > poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = ? > > > > ERESTART_RESTARTBLOCK (Interrupted by signal) > > > > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- > > > > +++ killed by SIGINT +++ > > > > > > > > root@vm-host:~# lsof /var/run/libvirt/libvirt-sock # hangs too ... > > > This is very suspicious. Looks like the daemon is in some weird state > > > and hence virsh is unable to get list of domains. > > > > > > # ps axf | grep libvirtd > > > # gdb -p $(pgrep libvirtd) > > > (gdb) t a a bt > > > > > > if you could run those commands and share the output that might shed > > > more light. > > > > > > Michal > > > > Unfortunately, gdb also hangs when attaching to libvirt .... > > > > root@vm-host:~# gdb -q -p $(pgrep libvirtd) > > Attaching to process 9556 > > [New LWP 9557] > > [New LWP 9558] > > [New LWP 9559] > > [New LWP 9560] > > [New LWP 9561] > > [New LWP 9562] > > [New LWP 9563] > > [New LWP 9564] > > [New LWP 9565] > > [New LWP 9566] > > [New LWP 9567] > > [New LWP 9568] > > [New LWP 9569] > > [New LWP 9570] > > [New LWP 9571] > > [New LWP 9572] > > ^C^C^C^C^C^C^C^C^C^C^C > > > > It must be killed with SIGKILL, and libvirtd will die with gdb. > > > > Here[1] is the output of the following command: > > > > strace -o /tmp/gdb-full-trace.txt -s 1024 -f gdb -q -p $(pgrep libvirtd) > > > > Thanks for your help : ) > > > > [1] https://www.csie.ntu.edu.tw/~yunchih/s/gdb-full-trace.txt > > > Any update on this? What other debug info could I provide? > > I just enable libvirtd's debug option and here[1] is the output. > > [1] https://www.csie.ntu.edu.tw/~yunchih/s/libvirtd-debug.logSo that shows many lines like: 2017-02-24 15:20:47.667+0000: 15887: debug : virStorageFileGetMetadataInternal:939 : path=/home/xxxx/install-virt.sh, buf=0x7f1d3810e580, len=1237, meta->format=-1 2017-02-24 15:20:47.667+0000: 15887: debug : virStorageFileProbeFormatFromBuf:808 : path=/home/xxxx/install-virt.sh, buf=0x7f1d3810e580, buflen=1237 2017-02-24 15:20:47.667+0000: 15887: debug : virStorageFileProbeFormatFromBuf:845 : format=1 2017-02-24 15:20:47.667+0000: 15887: debug : virFileClose:108 : Closed fd 19 It looks like you have configured a libvirt storage pool against your home directory, which is a pretty unusual thing todo. That shoudln't break things though but it will make the storage pool list 100's of irrelevant file that aren't VM disk images. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|