I have identical two hypervisors same operating system: Ubuntu 22.04.2 LTS Recently both virsh stopped talking to the libvirtd. Both stopped within a few days of each other. Currently if I run: virsh uri virsh version virsh list # virsh list ..nothing just hangs When I ran strace on these broken machines it get stuck at same spot: strace virsh list ... access("/var/run/libvirt/virtqemud-sock", F_OK) = -1 ENOENT (No such file or directory) access("/var/run/libvirt/libvirt-sock", F_OK) = 0 socket(AF_UNIX, SOCK_STREAM, 0) = 5 connect(5, {sa_family=AF_UNIX, sun_path="/var/run/libvirt/libvirt-sock"}, 110) = 0 getsockname(5, {sa_family=AF_UNIX}, [128 => 2]) = 0 futex(0x7fa716a672f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 fcntl(5, F_GETFD) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 futex(0x7fa716a67348, FUTEX_WAKE_PRIVATE, 2147483647) = 0 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 6 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7fa70c001cb0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7fa716a6786c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x7fa716a67378, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(4, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x7fa70c001cb0, FUTEX_WAKE_PRIVATE, 1) = 1 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}]) read(6, "\2\0\0\0\0\0\0\0", 16) = 8 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x5628ce6e9710, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(5, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0B\0\0\0\0\0\0\0\0\0\0\0\0", 28) = 28 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = 1 ([{fd=6, revents=POLLIN}]) read(6, "\5\0\0\0\0\0\0\0", 16) = 8 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 It gets stuck at this poll(). Note I tested strace on identical new install of ubtuntu 22.04 where virsh connects fine and get identical strace, except after this poll() it continues on with read/write ..etc. I turned on debugging for libvirtd and get no errors while virsh is trying to connect. I am able to get a virsh# shell. The shell only hangs when I try "connect, uri, version". Another method of debugging I tried was: LIBVIRT_DEBUG=error LIBVIRT_LOG_FILTERS="1:* " virsh uri .. .. 2023-06-06 20:51:22.312+0000: 1647: debug : doRemoteOpen:1128 : Trying authentication 2023-06-06 20:51:22.312+0000: 1647: debug : virNetMessageNew:44 : msg=0x55b996539680 tracked=0 2023-06-06 20:51:22.312+0000: 1647: debug : virNetMessageEncodePayload:383 : Encode length as 28 2023-06-06 20:51:22.312+0000: 1647: info : virNetClientSendInternal:2151 : RPC_CLIENT_MSG_TX_QUEUE: client=0x55b996538010 len=28 prog=536903814 vers=1 proc=66 type=0 status=0 serial=0 2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientCallNew:2107 : New call 0x55b996535f80: msg=0x55b996539680, expectReply=1, nonBlock=0 2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientIO:1920 : Outgoing message prog=536903814 version=1 serial=0 proc=66 type=0 length=28 dispatch=(nil) 2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientIO:1978 : We have the buck head=0x55b996535f80 call=0x55b996535f80 2023-06-06 20:51:22.312+0000: 1647: info : virEventGLibHandleUpdate:195 : EVENT_GLIB_UPDATE_HANDLE: watch=1 events=0 2023-06-06 20:51:22.312+0000: 1647: debug : virEventGLibHandleUpdate:206 : Update handle data=0x55b996534d30 watch=1 fd=5 events=0 2023-06-06 20:51:22.312+0000: 1647: debug : virEventGLibHandleUpdate:229 : Removed old handle source=0x55b996534de0 2023-06-06 20:51:22.312+0000: 1648: debug : virEventRunDefaultImpl:341 : running default event implementation Any help would be appreciated. thanks jerry
On Tue, Jun 06, 2023 at 04:56:38PM -0400, Jerry Buburuz wrote:>I have identical two hypervisors same operating system: Ubuntu 22.04.2 LTS > >Recently both virsh stopped talking to the libvirtd. Both stopped within a >few days of each other. > >Currently if I run: > >virsh uri >virsh version >virsh list > ># virsh list >..nothing just hangs > >When I ran strace on these broken machines it get stuck at same spot: >Is libvirtd running? It might be that you have socket activation with systemd and the socket this virsh is connecting to is not properly associated with the service. One of the things that might happen is that you want to debug the service, stop the service and a socket unit, but that will not remove it. Before debugging this make sure everything related in systemd is stopped and then try running libvirtd (or virtqemud, there are two services) with debugging enabled and then run the virsh commands with debugging enabled as well. Martin>strace virsh list >... > > >access("/var/run/libvirt/virtqemud-sock", F_OK) = -1 ENOENT (No such file >or directory) >access("/var/run/libvirt/libvirt-sock", F_OK) = 0 >socket(AF_UNIX, SOCK_STREAM, 0) = 5 >connect(5, {sa_family=AF_UNIX, sun_path="/var/run/libvirt/libvirt-sock"}, >110) = 0 >getsockname(5, {sa_family=AF_UNIX}, [128 => 2]) = 0 >futex(0x7fa716a672f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >fcntl(5, F_GETFD) = 0 >fcntl(5, F_SETFD, FD_CLOEXEC) = 0 >fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) >fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 >futex(0x7fa716a67348, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 6 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >write(4, "\1\0\0\0\0\0\0\0", 8) = 8 >write(4, "\1\0\0\0\0\0\0\0", 8) = 8 >futex(0x7fa70c001cb0, FUTEX_WAKE_PRIVATE, 1) = 1 >futex(0x7fa716a6786c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >futex(0x7fa716a67378, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >write(4, "\1\0\0\0\0\0\0\0", 8) = 8 >futex(0x7fa70c001cb0, FUTEX_WAKE_PRIVATE, 1) = 1 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 >poll([{fd=5, events=POLLOUT}, {fd=6, events=POLLIN}], 2, -1) = 2 ([{fd=5, >revents=POLLOUT}, {fd=6, revents=POLLIN}]) >read(6, "\2\0\0\0\0\0\0\0", 16) = 8 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >futex(0x5628ce6e9710, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >write(5, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0B\0\0\0\0\0\0\0\0\0\0\0\0", >28) = 28 >write(6, "\1\0\0\0\0\0\0\0", 8) = 8 >rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0 >poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1) = 1 ([{fd=6, >revents=POLLIN}]) >read(6, "\5\0\0\0\0\0\0\0", 16) = 8 >poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 > >It gets stuck at this poll(). Note I tested strace on identical new >install of ubtuntu 22.04 where virsh connects fine and get identical >strace, except after this poll() it continues on with read/write ..etc. > >I turned on debugging for libvirtd and get no errors while virsh is trying >to connect. > >I am able to get a virsh# shell. The shell only hangs when I try "connect, >uri, version". > >Another method of debugging I tried was: > >LIBVIRT_DEBUG=error LIBVIRT_LOG_FILTERS="1:* " virsh uri >.. >.. >2023-06-06 20:51:22.312+0000: 1647: debug : doRemoteOpen:1128 : Trying >authentication >2023-06-06 20:51:22.312+0000: 1647: debug : virNetMessageNew:44 : >msg=0x55b996539680 tracked=0 >2023-06-06 20:51:22.312+0000: 1647: debug : virNetMessageEncodePayload:383 >: Encode length as 28 >2023-06-06 20:51:22.312+0000: 1647: info : virNetClientSendInternal:2151 : >RPC_CLIENT_MSG_TX_QUEUE: client=0x55b996538010 len=28 prog=536903814 >vers=1 proc=66 type=0 status=0 serial=0 >2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientCallNew:2107 : New >call 0x55b996535f80: msg=0x55b996539680, expectReply=1, nonBlock=0 >2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientIO:1920 : Outgoing >message prog=536903814 version=1 serial=0 proc=66 type=0 length=28 >dispatch=(nil) >2023-06-06 20:51:22.312+0000: 1647: debug : virNetClientIO:1978 : We have >the buck head=0x55b996535f80 call=0x55b996535f80 >2023-06-06 20:51:22.312+0000: 1647: info : virEventGLibHandleUpdate:195 : >EVENT_GLIB_UPDATE_HANDLE: watch=1 events=0 >2023-06-06 20:51:22.312+0000: 1647: debug : virEventGLibHandleUpdate:206 : >Update handle data=0x55b996534d30 watch=1 fd=5 events=0 >2023-06-06 20:51:22.312+0000: 1647: debug : virEventGLibHandleUpdate:229 : >Removed old handle source=0x55b996534de0 >2023-06-06 20:51:22.312+0000: 1648: debug : virEventRunDefaultImpl:341 : >running default event implementation > >Any help would be appreciated. > >thanks >jerry > > > > > > > > > > > >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20230608/f0b89ad8/attachment.sig>
On Tue, Jun 06, 2023 at 04:56:38PM -0400, Jerry Buburuz wrote:> Recently both virsh stopped talking to the libvirtd. Both stopped within a > few days of each other.I've run into exactly the same problem. I'm running libvirt (libvirt-9.0.0-3.fc38.x86_64) on Fedora 38. On Fedora, libvirtd is configured by default to use socket activation and is run with the `--timeout 120` option. After some recent upgrades, I'm seeing the exact same symptoms that Jerry described -- virsh commands simply get stuck at same call to `poll()`. It looks like libvirtd is either crashing or failing to start, because when virsh is in this state the `libvirtd` process isn't running. This makes it *sound* like a systemd problem, but I'm not seeing errors anywhere -- either from libvirtd or from systemd. I've worked around the problem locally by re-configuring libvirtd to run persistently rather than using socket activation: systemctl disable --now libvirtd{,-ro,-admin}.socket cat > /etc/systemd/system/libvirtd.service.d/override.conf <<EOF [Service] EnvironmentFile EOF systemctl restart libvirtd Package versions in case this helps correlate something: - libvirt-9.0.0-3.fc38.x86_64 - systemd-253.5-1.fc38.x86_64 - kernel-6.3.6-200.fc38.x86_64 Libvirt uri: qemu:///system -- Lars Kellogg-Stedman <lars at redhat.com> | larsks @ {irc,twitter,github} http://blog.oddbit.com/ | N1LKS