Martin Kletzander
2022-Jan-24 09:35 UTC
frequent network collapse possibly due to bridging
On Fri, Jan 21, 2022 at 08:42:58AM -0600, Hakan E. Duran wrote:>Hi, > >I would like some help to troubleshoot the problem I have been having >lately with my VM host, which contains 5 VMs, one of which is for >pi-hole, unbound services. It has been a relatively common occurrence in >the last few weeks for me to find that the host machine has lost its >network when I get back home from work. Restoring the VM/VMs do not fix >the problem, the host needs to be restarted for a fix, otherwise there >is both loss of name resolution, as well as an internet connection; I >cannot ping even IPs such as 8.8.8.8. Since I use the pi-hole VM as the DNS >server for my LAN, this means that my whole LAN gets disconnected from >internet, until the host machine is rebooted. The host machine has a >little complicated network setup: the two gigabit connections are bonded >and bridged to the VMs; however this set up has been serving me so well >for several years now. The problem, on the other hand, appeared a few >weeks ago. This doesn't happen every day but often enough to be annoying >and disruptive for my family. >Always good to check what has changed those weeks ago, but I understand it is difficult to find out what you were updating and where.>My question is, how can I troubleshoot this problem and figure out >whether it is truly due to network bridging somehow collapsing or not? I >tried to find some log files but all I could find were the >/var/log/libvirt/qemu/$VM files, and the particular log file for the pi-hole >VM reported the following lines; however, I am not sure if they are >associated with a real crash or just due to shutting down and restarting >the host (please excuse the word-wrapping): > >char device redirected to /dev/pts/2 (label charserial0) >qxl_send_events: spice-server bug: guest stopped, ignoring >2022-01-20T23:41:17.012445Z qemu-system-x86_64: terminating on signal 15 from pid 1 (/sbin/init)Probably restarting the host as it got SIGTERM'd by init. Maybe it was restarted in a bad time and there is some inconsistency on the disk? Using something like libvirt-guests which can manage your machines when rebooting would be a good idea.>2022-01-20 23:41:17.716+0000: shutting down, reason=crashed >2022-01-20 23:42:46.059+0000: starting up libvirt version: 7.10.0, qemu >version: 6.2.0, kernel: 5.10.89-1-MANJARO, hostname: -redacted- > >Please excuse my ignorance but is there a way to restart the >networking without rebooting the host machine? This will not solve myYou can do: virsh net-destroy <network_name> virsh net-start <network_name> but depending on what the network looks like, how it is set up etc. you might need to restart some of the VMs or manually plug them in.>problem since I won't be able to reach to the host remotely if the >networking is down. The real solution would be preventing these network >crashes and the first step in that would be effective troubleshooting in >my opinion. Any input/guidance will be greatly appreciated. > >I can provide more info about my host/VM(s) if the above is not adequate. >I'm not sure how much more I can help as I do not understand what is the actual setup. What I would do is try to figure out what exactly happens when it breaks and then go from that (setting up logging etc.), just general tips I guess.>Thanks, > >Hakan Duran >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20220124/ae16f059/attachment.sig>
On 1/24/22 4:35 AM, Martin Kletzander wrote:> On Fri, Jan 21, 2022 at 08:42:58AM -0600, Hakan E. Duran wrote: >> Hi, >> >> I would like some help to troubleshoot the problem I have been having >> lately with my VM host, which contains 5 VMs, one of which is for >> pi-hole, unbound services. It has been a relatively common occurrence in >> the last few weeks for me to find that the host machine has lost its >> network when I get back home from work. Restoring the VM/VMs do not fix >> the problem, the host needs to be restarted for a fix, otherwise there >> is both loss of name resolution, as well as an internet connection; I >> cannot ping even IPs such as 8.8.8.8. Since I use the pi-hole VM as >> the DNS >> server for my LAN, this means that my whole LAN gets disconnected from >> internet, until the host machine is rebooted. The host machine has a >> little complicated network setup: the two gigabit connections are bonded >> and bridged to the VMs; however this set up has been serving me so well >> for several years now. The problem, on the other hand, appeared a few >> weeks ago. This doesn't happen every day but often enough to be annoying >> and disruptive for my family. >> > > Always good to check what has changed those weeks ago, but I understand > it is difficult to find out what you were updating and where. > >> My question is, how can I troubleshoot this problem and figure out >> whether it is truly due to network bridging somehow collapsing or not? I >> tried to find some log files but all I could find were the >> /var/log/libvirt/qemu/$VM files, and the particular log file for the >> pi-hole >> VM reported the following lines; however, I am not sure if they are >> associated with a real crash or just due to shutting down and restarting >> the host (please excuse the word-wrapping): >> >> char device redirected to /dev/pts/2 (label charserial0) >> qxl_send_events: spice-server bug: guest stopped, ignoring >> 2022-01-20T23:41:17.012445Z qemu-system-x86_64: terminating on signal >> 15 from pid 1 (/sbin/init) > > Probably restarting the host as it got SIGTERM'd by init.? Maybe it was > restarted in a bad time and there is some inconsistency on the disk? > Using something like libvirt-guests which can manage your machines when > rebooting would be a good idea. > >> 2022-01-20 23:41:17.716+0000: shutting down, reason=crashed >> 2022-01-20 23:42:46.059+0000: starting up libvirt version: 7.10.0, qemu >> version: 6.2.0, kernel: 5.10.89-1-MANJARO, hostname: -redacted- >> >> Please excuse my ignorance but is there a way to restart the >> networking without rebooting the host machine? This will not solve my > > You can do: > > virsh net-destroy <network_name> > virsh net-start <network_name> > > but depending on what the network looks like, how it is set up etc. you > might need to restart some of the VMs or manually plug them in.The connection between any guest tap device and a host bridge device will be broken by virsh net-destroy, and not restored by virsh net-start (because the network driver has no good way of notifying the QEMU driver that it has restarted a network). This is something that's been on my "list of annoying things I should fix some day" for a very long time, but I've never been motivated enough to figure out a clean solution. In the meantime, if you destroy/start a network, you can get all the guest tap devices reconnected by restarting libvirtd: systemctl restart libvirtd.service or if you're using split daemons: systemctl restart virtqemud.service One of the things the QEMU driver does when it's initializing is to check where each guest tap device *should* be connected, compare that to where it *is* connected, and if those don't match then fix it.