thr3ads.net - libvirt users - frequent network collapse possibly due to bridging [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Martin Kletzander

2022-Jan-24 09:35 UTC

frequent network collapse possibly due to bridging

On Fri, Jan 21, 2022 at 08:42:58AM -0600, Hakan E. Duran
wrote:>Hi,
>
>I would like some help to troubleshoot the problem I have been having
>lately with my VM host, which contains 5 VMs, one of which is for
>pi-hole, unbound services. It has been a relatively common occurrence in
>the last few weeks for me to find that the host machine has lost its
>network when I get back home from work. Restoring the VM/VMs do not fix
>the problem, the host needs to be restarted for a fix, otherwise there
>is both loss of name resolution, as well as an internet connection; I
>cannot ping even IPs such as 8.8.8.8. Since I use the pi-hole VM as the DNS
>server for my LAN, this means that my whole LAN gets disconnected from
>internet, until the host machine is rebooted. The host machine has a
>little complicated network setup: the two gigabit connections are bonded
>and bridged to the VMs; however this set up has been serving me so well
>for several years now. The problem, on the other hand, appeared a few
>weeks ago. This doesn't happen every day but often enough to be annoying
>and disruptive for my family.
>
Always good to check what has changed those weeks ago, but I understand
it is difficult to find out what you were updating and where.
>My question is, how can I troubleshoot this problem and figure out
>whether it is truly due to network bridging somehow collapsing or not? I
>tried to find some log files but all I could find were the
>/var/log/libvirt/qemu/$VM files, and the particular log file for the pi-hole
>VM reported the following lines; however, I am not sure if they are
>associated with a real crash or just due to shutting down and restarting
>the host (please excuse the word-wrapping):
>
>char device redirected to /dev/pts/2 (label charserial0)
>qxl_send_events: spice-server bug: guest stopped, ignoring
>2022-01-20T23:41:17.012445Z qemu-system-x86_64: terminating on signal 15
from pid 1 (/sbin/init)
Probably restarting the host as it got SIGTERM'd by init.  Maybe it was
restarted in a bad time and there is some inconsistency on the disk?
Using something like libvirt-guests which can manage your machines when
rebooting would be a good idea.
>2022-01-20 23:41:17.716+0000: shutting down, reason=crashed
>2022-01-20 23:42:46.059+0000: starting up libvirt version: 7.10.0, qemu
>version: 6.2.0, kernel: 5.10.89-1-MANJARO, hostname: -redacted-
>
>Please excuse my ignorance but is there a way to restart the
>networking without rebooting the host machine? This will not solve my
You can do:

virsh net-destroy <network_name>
virsh net-start <network_name>

but depending on what the network looks like, how it is set up etc. you
might need to restart some of the VMs or manually plug them in.
>problem since I won't be able to reach to the host remotely if the
>networking is down. The real solution would be preventing these network
>crashes and the first step in that would be effective troubleshooting in
>my opinion. Any input/guidance will be greatly appreciated.
>
>I can provide more info about my host/VM(s) if the above is not adequate.
>
I'm not sure how much more I can help as I do not understand what is the
actual setup.  What I would do is try to figure out what exactly happens
when it breaks and then go from that (setting up logging etc.), just
general tips I guess.
>Thanks,
>
>Hakan Duran
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<http://listman.redhat.com/archives/libvirt-users/attachments/20220124/ae16f059/attachment.sig>

Laine Stump

2022-Jan-24 22:30 UTC

head link

frequent network collapse possibly due to bridging

On 1/24/22 4:35 AM, Martin Kletzander wrote:> On Fri, Jan 21, 2022 at 08:42:58AM -0600, Hakan E. Duran wrote:
>> Hi,
>>
>> I would like some help to troubleshoot the problem I have been having
>> lately with my VM host, which contains 5 VMs, one of which is for
>> pi-hole, unbound services. It has been a relatively common occurrence
in
>> the last few weeks for me to find that the host machine has lost its
>> network when I get back home from work. Restoring the VM/VMs do not fix
>> the problem, the host needs to be restarted for a fix, otherwise there
>> is both loss of name resolution, as well as an internet connection; I
>> cannot ping even IPs such as 8.8.8.8. Since I use the pi-hole VM as 
>> the DNS
>> server for my LAN, this means that my whole LAN gets disconnected from
>> internet, until the host machine is rebooted. The host machine has a
>> little complicated network setup: the two gigabit connections are
bonded
>> and bridged to the VMs; however this set up has been serving me so well
>> for several years now. The problem, on the other hand, appeared a few
>> weeks ago. This doesn't happen every day but often enough to be
annoying
>> and disruptive for my family.
>>
> 
> Always good to check what has changed those weeks ago, but I understand
> it is difficult to find out what you were updating and where.
> 
>> My question is, how can I troubleshoot this problem and figure out
>> whether it is truly due to network bridging somehow collapsing or not?
I
>> tried to find some log files but all I could find were the
>> /var/log/libvirt/qemu/$VM files, and the particular log file for the 
>> pi-hole
>> VM reported the following lines; however, I am not sure if they are
>> associated with a real crash or just due to shutting down and
restarting
>> the host (please excuse the word-wrapping):
>>
>> char device redirected to /dev/pts/2 (label charserial0)
>> qxl_send_events: spice-server bug: guest stopped, ignoring
>> 2022-01-20T23:41:17.012445Z qemu-system-x86_64: terminating on signal 
>> 15 from pid 1 (/sbin/init)
> 
> Probably restarting the host as it got SIGTERM'd by init.? Maybe it was
> restarted in a bad time and there is some inconsistency on the disk?
> Using something like libvirt-guests which can manage your machines when
> rebooting would be a good idea.
> 
>> 2022-01-20 23:41:17.716+0000: shutting down, reason=crashed
>> 2022-01-20 23:42:46.059+0000: starting up libvirt version: 7.10.0, qemu
>> version: 6.2.0, kernel: 5.10.89-1-MANJARO, hostname: -redacted-
>>
>> Please excuse my ignorance but is there a way to restart the
>> networking without rebooting the host machine? This will not solve my
> 
> You can do:
> 
> virsh net-destroy <network_name>
> virsh net-start <network_name>
> 
> but depending on what the network looks like, how it is set up etc. you
> might need to restart some of the VMs or manually plug them in.
The connection between any guest tap device and a host bridge device 
will be broken by virsh net-destroy, and not restored by virsh net-start 
(because the network driver has no good way of notifying the QEMU driver 
that it has restarted a network). This is something that's been on my 
"list of annoying things I should fix some day" for a very long time, 
but I've never been motivated enough to figure out a clean solution.

In the meantime, if you destroy/start a network, you can get all the 
guest tap devices reconnected by restarting libvirtd:

    systemctl restart libvirtd.service

or if you're using split daemons:

    systemctl restart virtqemud.service

One of the things the QEMU driver does when it's initializing is to 
check where each guest tap device *should* be connected, compare that to 
where it *is* connected, and if those don't match then fix it.

libvirt users - Jan 2022 - frequent network collapse possibly due to bridging

frequent network collapse possibly due to bridging

frequent network collapse possibly due to bridging