Minami Katsumata
2014-Feb-07 08:47 UTC
[libvirt-users] libvirt crashes with Caught Segmentation violation
Hi, I'm having problems with libvirt crashing after a couple hours when a specific domain monitoring program is running. I have pasted below the following: 1. libvirt version 2. qemu-kvm version 3. OS version 4. Kernel version 5. libvirt status post-crash 6. libvirtd.log (info level dump around crash; too long to post everything so just the beginning and end. UTC) 7. custom.log (on what this domain monitoring program was doing around the time of the crash. JST) 8. FYI on the program being executed 9. other related server settings Please, if anyone can look through these and give some insight as to what is causing libvirt to crash, that would be greatly appreciated. 1.) libvirt version: # rpm -q libvirt libvirt-0.9.10-21.el6.x86_64 2.) qemu-kvm version: qemu-kvm-0.12.1.2-3.295.el6.10.x86_64 3.) OS version: # cat /etc/redhat-release CentOS release 6.3 (Final) 4.) Kernel version: # uname -r 2.6.32-279.22.1.el6.x86_64 5.) libvirt status after crash: # service libvirtd status libvirtd dead but pid file exists 6.) libvirtd.log 2014-02-06 10:25:05.173+0000: 1187: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58626,uid:0 2014-02-06 10:25:05.237+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58636,uid:0 2014-02-06 10:25:05.271+0000: 1185: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58646,uid:0 2014-02-06 10:25:05.301+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58648,uid:0 2014-02-06 10:25:05.400+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58650,uid:0 Caught Segmentation violation dumping internal log buffer: ====== start of log ==== ^@05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=19 w=21, f=32 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=20 w=22, f=34 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=21 w=23, f=33 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=22 w=24, f=36 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=23 w=25, f=38 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=24 w=26, f=39 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=25 w=27, f=41 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=26 w=28, f=40 e=25 d=0 (cut out due to length) 2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:488 : EVENT_POLL_DISPATCH_HANDLE: watch=2791 events=2 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2326a20 nfds=0 cb=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:137 : tls=(nil) hs=-1, rx=0x2266390 tx=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:167 : mode=1 2014-02-06 10:25:05.423+00001182: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=2791 events=1 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 1 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:474 : i=33 w=2793 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientClose:632 : client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveStop:382 : RPC_KEEPALIVE_STOP: ka=0x225bf20 client=0x22e6860 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8290 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8289 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : daemonRemoveAllClientStreams:493 : stream=(nil) 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=2791 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:193 : mark delete 32 50 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2266390 nfds=0 cb=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetSocketFree:722 : RPC_SOCKET_FREE: sock=0x22e66a0 refs=2 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : virEventRunDefaultImpl:244 : running default event implementation 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8289 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=2 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8290 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=1 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=2 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:567 : EVENT_POLL_PURGE_HANDLE: watch=2791 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=1 2014-02-06 10:25:05.423+00001182: debug : virConnectClose:1462 : conn=0x7f1b380c4630 2014-02-06 10:25:05.423+00001182: debug : virUnrefConnect:145 : unref connection 0x7f1b380c4630 1 2014-02-06 10:25:05.423+00001182: debug : virReleaseConnect:94 : release connection 0x7f1b380c4630 ====== end of log ==== 7.) custom.log Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh domifstat i-8-114-VM Interf ace Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh domifstat i-8-114-VM vnet4 Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh list --all Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh domiflist i-8-114-VM Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh domifstat i-8-114-VM Interf ace Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh domifstat i-8-114-VM vnet4 Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh list --all Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh list --all Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix : TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ; COMMAND=/usr/bin/virsh list --all Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so 8.) FYI on this program: This program consists of three main scripts that are run via cron. 2 run every 5 minutes, and 1 runs every minute. The two scripts that are executed every 5 minutes rely heavily on the virsh command. However, it is made so that the simultaneous number of connections to libvirt is not too large; the max number of libvirt-sock connections at any given moment does not go over 6. As it is a domain monitoring program, it only executes the following virsh commands: virsh list --all virsh dominfo virsh domblklist virsh domblkstat virsh domiflist virsh domifstat 9.) other related server settings 9-1) user resource limits [root@ ~]# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 773493 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 32768 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 9-2) libvirtd.conf The following settings have been changed: #max_clients = 20 max_clients = 250 #max_workers = 20 max_workers = 250 #max_requests = 20 max_requests = 250 Regards, Minami
Michal Privoznik
2014-Feb-07 09:30 UTC
Re: [libvirt-users] libvirt crashes with Caught Segmentation violation
On 07.02.2014 09:47, Minami Katsumata wrote:> Hi, > > I'm having problems with libvirt crashing after a couple hours when a > specific domain monitoring program is running. > > I have pasted below the following: > 1. libvirt version > 2. qemu-kvm version > 3. OS version > 4. Kernel version > 5. libvirt status post-crash > 6. libvirtd.log (info level dump around crash; too long to post > everything so just the beginning and end. UTC) > 7. custom.log (on what this domain monitoring program was doing around > the time of the crash. JST) > 8. FYI on the program being executed > 9. other related server settings > > Please, if anyone can look through these and give some insight as to > what is causing libvirt to crash, that would be greatly appreciated. > > 1.) libvirt version: > # rpm -q libvirt > libvirt-0.9.10-21.el6.x86_64This is rather ancient libvirt, can you please update and see if the issue was fixed?> > 2.) qemu-kvm version: > qemu-kvm-0.12.1.2-3.295.el6.10.x86_64 > > 3.) OS version: > # cat /etc/redhat-release > CentOS release 6.3 (Final)Ah, this explains the libvirt version. AFAIK there's been Centos-6.5 released.> > 4.) Kernel version: > # uname -r > 2.6.32-279.22.1.el6.x86_64 > > 5.) libvirt status after crash: > # service libvirtd status > libvirtd dead but pid file exists > > 6.) libvirtd.log > > 2014-02-06 10:25:05.173+0000: 1187: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58626,uid:0 > 2014-02-06 10:25:05.237+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58636,uid:0 > 2014-02-06 10:25:05.271+0000: 1185: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58646,uid:0 > 2014-02-06 10:25:05.301+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58648,uid:0 > 2014-02-06 10:25:05.400+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58650,uid:0 > Caught Segmentation violation dumping internal log buffer: > > ====== start of log ====> > ^@05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=19 w=21, f=32 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=20 w=22, f=34 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=21 w=23, f=33 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=22 w=24, f=36 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=23 w=25, f=38 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=24 w=26, f=39 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=25 w=27, f=41 e=25 d=0 > 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=26 w=28, f=40 e=25 d=0 > > (cut out due to length) > > 2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:488 : EVENT_POLL_DISPATCH_HANDLE: watch=2791 events=2 > 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2326a20 nfds=0 cb=(nil) > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:137 : tls=(nil) hs=-1, rx=0x2266390 tx=(nil) > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:167 : mode=1 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=2791 events=1 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 1 -1675536288 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:474 : i=33 w=2793 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientClose:632 : client=0x22e6860 refs=3 > 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveStop:382 : RPC_KEEPALIVE_STOP: ka=0x225bf20 client=0x22e6860 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8290 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8289 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 > 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=3 > 2014-02-06 10:25:05.423+00001182: debug : daemonRemoveAllClientStreams:493 : stream=(nil) > 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=2791 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:193 : mark delete 32 50 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 > 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2266390 nfds=0 cb=(nil) > 2014-02-06 10:25:05.423+00001182: debug : virNetSocketFree:722 : RPC_SOCKET_FREE: sock=0x22e66a0 refs=2 > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=3 > 2014-02-06 10:25:05.423+00001182: debug : virEventRunDefaultImpl:244 : running default event implementation > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8289 > 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=2 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8290 > 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=1 > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=2 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 > 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:567 : EVENT_POLL_PURGE_HANDLE: watch=2791 > 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=1 > 2014-02-06 10:25:05.423+00001182: debug : virConnectClose:1462 : conn=0x7f1b380c4630 > 2014-02-06 10:25:05.423+00001182: debug : virUnrefConnect:145 : unref connection 0x7f1b380c4630 1 > 2014-02-06 10:25:05.423+00001182: debug : virReleaseConnect:94 : release connection 0x7f1b380c4630 > > ====== end of log ====These logs are pretty much useless (not your fault). On one hand, they may help us to see what libvirt was doing just before the crash. On the other hand: a) it completely misses TID b) it ends just before SIGSEGV occurs (so for example if segmentation fault happens in one thread, the logs may as well been showing completely unrelated thread). Therefore I think attaching gdb to the libvirtd, then reproducing the crash would gain more data, IMO. Michal
Minami Katsumata
2014-Feb-14 04:19 UTC
Re: [libvirt-users] libvirt crashes with Caught Segmentation violation
Hi, Thank you Michal for your advice. I went ahead and used gdb, and this is the result I got: 2014-02-14 00:54:11.109+0000: 34958: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58466,uid:0 2014-02-14 00:54:11.125+0000: 34958: error : qemudDomainInterfaceStats:7865 : ?????: ??????'Interface' ?????????????? 2014-02-14 00:54:11.155+0000: 34960: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58474,uid:0 2014-02-14 00:54:11.218+0000: 34967: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58489,uid:0 2014-02-14 00:54:11.282+0000: 34961: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58506,uid:0 2014-02-14 00:54:11.392+0000: 34967: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58514,uid:0 2014-02-14 00:54:11.408+0000: 34967: error : qemudDomainInterfaceStats:7865 : ?????: ??????'Interface' ?????????????? 2014-02-14 00:54:11.438+0000: 34960: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58522,uid:0 2014-02-14 00:54:11.502+0000: 34959: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58536,uid:0 2014-02-14 00:54:11.612+0000: 34961: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58573,uid:0 2014-02-14 00:54:11.707+0000: 34960: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58600,uid:0 2014-02-14 00:54:12.007+0000: 34962: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58642,uid:0 2014-02-14 00:54:12.072+0000: 34958: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58653,uid:0 2014-02-14 00:54:12.197+0000: 34960: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58666,uid:0 2014-02-14 00:54:12.264+0000: 34966: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58678,uid:0 2014-02-14 00:54:12.355+0000: 34967: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58685,uid:0 2014-02-14 00:54:12.508+0000: 34964: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58693,uid:0 2014-02-14 00:54:12.677+0000: 34961: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58707,uid:0 2014-02-14 00:54:12.744+0000: 34960: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58719,uid:0 2014-02-14 00:54:12.843+0000: 34965: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58726,uid:0 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f77cd80d700 (LWP 34958)] 0x00007f77d81b696c in free () from /lib64/libc.so.6 Here's the result of the backtrace: (gdb) backtrace #0 0x00007f77d81b696c in free () from /lib64/libc.so.6 #1 0x00007f77db790159 in virFree (ptrptr=0x7f77a0020088) at util/memory.c:310 #2 0x00007f77db7a1072 in virTypedParameterArrayClear (params=<value optimized out>, nparams=5) at util/virtypedparam.c:58 #3 0x0000000000425bec in remoteDispatchDomainBlockStatsFlags ( server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, rerr=0x7f77cd80cbc0, args=<value optimized out>, ret=0x7f77a0021730) at remote.c:1186 #4 remoteDispatchDomainBlockStatsFlagsHelper (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, rerr=0x7f77cd80cbc0, args=<value optimized out>, ret=0x7f77a0021730) at remote_dispatch.h:757 #5 0x00007f77db856e65 in virNetServerProgramDispatchCall (prog=0xf1f220, server=0xf13ce0, client=0xf1b570, msg=0xf1f3d0) at rpc/virnetserverprogram.c:416 #6 virNetServerProgramDispatch (prog=0xf1f220, server=0xf13ce0, client=0xf1b570, msg=0xf1f3d0) at rpc/virnetserverprogram.c:289 #7 0x00007f77db8580f1 in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=0xf13ce0) at rpc/virnetserver.c:164 #8 0x00007f77db79a45c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144 #9 0x00007f77db799d79 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161 #10 0x00007f77d88dc851 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f77d822367d in clone () from /lib64/libc.so.6 And util/virtypedparam.c:58 looks like this for libvirt-0.9.10: 49 virTypedParameterArrayClear(virTypedParameterPtr params, int nparams) 50 { 51 int i; 52 53 if (!params) 54 return; 55 56 for (i = 0; i < nparams; i++) { 57 if (params[i].type == VIR_TYPED_PARAM_STRING) 58 VIR_FREE(params[i].value.s); 59 } 60 } 61 I'm very unfamiliar with C so I would again appreciate any input on why this is happening, and what I can do to solve this issue. By the way this segfault can be 100% reproduced with CentOS 6.3 but not CentOS 6.2. Thank you and regards, Minami On 2014/02/07 18:30, Michal Privoznik wrote:> On 07.02.2014 09:47, Minami Katsumata wrote: >> Hi, >> >> I'm having problems with libvirt crashing after a couple hours when a >> specific domain monitoring program is running. >> >> I have pasted below the following: >> 1. libvirt version >> 2. qemu-kvm version >> 3. OS version >> 4. Kernel version >> 5. libvirt status post-crash >> 6. libvirtd.log (info level dump around crash; too long to post >> everything so just the beginning and end. UTC) >> 7. custom.log (on what this domain monitoring program was doing around >> the time of the crash. JST) >> 8. FYI on the program being executed >> 9. other related server settings >> >> Please, if anyone can look through these and give some insight as to >> what is causing libvirt to crash, that would be greatly appreciated. >> >> 1.) libvirt version: >> # rpm -q libvirt >> libvirt-0.9.10-21.el6.x86_64 > > This is rather ancient libvirt, can you please update and see if the > issue was fixed? > >> >> 2.) qemu-kvm version: >> qemu-kvm-0.12.1.2-3.295.el6.10.x86_64 >> >> 3.) OS version: >> # cat /etc/redhat-release >> CentOS release 6.3 (Final) > > Ah, this explains the libvirt version. AFAIK there's been Centos-6.5 > released. > >> >> 4.) Kernel version: >> # uname -r >> 2.6.32-279.22.1.el6.x86_64 >> >> 5.) libvirt status after crash: >> # service libvirtd status >> libvirtd dead but pid file exists >> >> 6.) libvirtd.log >> >> 2014-02-06 10:25:05.173+0000: 1187: info : >> remoteDispatchAuthList:2091 : Bypass polkit auth for privileged >> client pid:58626,uid:0 >> 2014-02-06 10:25:05.237+0000: 1184: info : >> remoteDispatchAuthList:2091 : Bypass polkit auth for privileged >> client pid:58636,uid:0 >> 2014-02-06 10:25:05.271+0000: 1185: info : >> remoteDispatchAuthList:2091 : Bypass polkit auth for privileged >> client pid:58646,uid:0 >> 2014-02-06 10:25:05.301+0000: 1184: info : >> remoteDispatchAuthList:2091 : Bypass polkit auth for privileged >> client pid:58648,uid:0 >> 2014-02-06 10:25:05.400+0000: 1184: info : >> remoteDispatchAuthList:2091 : Bypass polkit auth for privileged >> client pid:58650,uid:0 >> Caught Segmentation violation dumping internal log buffer: >> >> ====== start of log ====>> >> ^@05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=19 >> w=21, f=32 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=20 w=22, f=34 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=21 w=23, f=33 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=22 w=24, f=36 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=23 w=25, f=38 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=24 w=26, f=39 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=25 w=27, f=41 e=25 d=0 >> 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 >> : Prepare n=26 w=28, f=40 e=25 d=0 >> >> (cut out due to length) >> >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollDispatchHandles:488 : EVENT_POLL_DISPATCH_HANDLE: >> watch=2791 events=2 >> 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : >> msg=0x2326a20 nfds=0 cb=(nil) >> 2014-02-06 10:25:05.423+00001182: debug : >> virNetServerClientCalculateHandleMode:137 : tls=(nil) hs=-1, >> rx=0x2266390 tx=(nil) >> 2014-02-06 10:25:05.423+00001182: debug : >> virNetServerClientCalculateHandleMode:167 : mode=1 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=2791 >> events=1 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollInterruptLocked:702 : Skip interrupt, 1 -1675536288 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollDispatchHandles:474 : i=33 w=2793 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupTimeouts:506 : Cleanup 12 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupHandles:554 : Cleanup 34 >> 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientClose:632 >> : client=0x22e6860 refs=3 >> 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveStop:382 : >> RPC_KEEPALIVE_STOP: ka=0x225bf20 client=0x22e6860 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8290 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8289 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 >> 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : >> RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=3 >> 2014-02-06 10:25:05.423+00001182: debug : >> daemonRemoveAllClientStreams:493 : stream=(nil) >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=2791 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollRemoveHandle:193 : mark delete 32 50 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 >> 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : >> msg=0x2266390 nfds=0 cb=(nil) >> 2014-02-06 10:25:05.423+00001182: debug : virNetSocketFree:722 : >> RPC_SOCKET_FREE: sock=0x22e66a0 refs=2 >> 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 >> : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=3 >> 2014-02-06 10:25:05.423+00001182: debug : virEventRunDefaultImpl:244 >> : running default event implementation >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupTimeouts:506 : Cleanup 12 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8289 >> 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : >> RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=2 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8290 >> 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : >> RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=1 >> 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 >> : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=2 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupHandles:554 : Cleanup 34 >> 2014-02-06 10:25:05.423+00001182: debug : >> virEventPollCleanupHandles:567 : EVENT_POLL_PURGE_HANDLE: watch=2791 >> 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 >> : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=1 >> 2014-02-06 10:25:05.423+00001182: debug : virConnectClose:1462 : >> conn=0x7f1b380c4630 >> 2014-02-06 10:25:05.423+00001182: debug : virUnrefConnect:145 : unref >> connection 0x7f1b380c4630 1 >> 2014-02-06 10:25:05.423+00001182: debug : virReleaseConnect:94 : >> release connection 0x7f1b380c4630 >> >> ====== end of log ====> > These logs are pretty much useless (not your fault). On one hand, they > may help us to see what libvirt was doing just before the crash. On > the other hand: > a) it completely misses TID > b) it ends just before SIGSEGV occurs (so for example if segmentation > fault happens in one thread, the logs may as well been showing > completely unrelated thread). > > Therefore I think attaching gdb to the libvirtd, then reproducing the > crash would gain more data, IMO. > > Michal