Peter Crowther
2018-Jan-27 13:44 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
You say you can ping but not ssh. If you install tcpdump on the VM, can you see the ping packets arriving and leaving? If not, I suspect an address collision - especially if ping continues to work with the VM shut down. If you can't ping, check the other end of your bridge. I'm more familiar with open vSwitch, but I'm somewhat concerned that your bridge definition doesn't include a physical NIC as one of its connections. Peter On 27 Jan 2018 1:13 p.m., "Adrian Pascalau" <adrian27oradea@gmail.com> wrote: Hi, I have a strange issue in a libvirt environment, and I do not know how to solve it. I have two centos hosts: first one is a physical server called server1, that acts as a host for the second one, called centos1. The centos1 is a virtual machine (VM) running in server1. A linux bridge in forwarding mode is used to connect the centos1 VM network interface to the server1 network interface and to the external network. The centos1 VM and the linux bridge are managed with libvirt (well, the bridge itself in this case is created manually). # virsh net-dumpxml br0 <network connections='1'> <name>br0</name> <uuid>5aaf72a5-023d-4b84-9d7c-d68b0918f620</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # brctl show bridge name bridge id STP enabled interfaces br0 8000.fc15b4137688 no eno1 vnet0 Both server1 and centos1 have IP addresses in the same subnet, and both are reachable with ping from every other host in my network. In both server1 and centos1, the openssh-server configuration in /etc/ssh/sshd_config is the default one, and has not been changed. When I ssh with Putty to the physical server server1 IP address, everything works as expected: I get a login prompt, I enter my password and I log in. However, when I use Putty to connect to the centos1 VM, I do not get a login prompt whatsoever. So I think there might be some issue in between the server1 physical interface and my centos VM. I used openssh-server in debug mode, so see where the ssh connection hangs, and here is what I get: [...] debug1: Server will not fork when running in debugging mode. debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8 debug1: sshd version OpenSSH_7.4, OpenSSL 1.0.2k-fips 26 Jan 2017 debug1: private host key #0: ssh-rsa SHA256:pEuFQsodwK+0PoRzbVRba1ahHLEpwp8DG2KGQmxOGJk debug1: private host key #1: ecdsa-sha2-nistp256 SHA256:F6HrSNWZhYaU7LMweI+RBviqTCHcTYyMBGPDz5OjT4c debug1: private host key #2: ssh-ed25519 SHA256:aG3V6jjPHXUnNeavbxT/xozqrb5q3yWDkkAmXBCdnGk debug1: inetd sockets after dupping: 3, 3 Connection from x.x.x.181 port 49436 on x.x.x.115 port 22 debug1: Client protocol version 2.0; client software version PuTTY_Release_0.70 debug1: no match: PuTTY_Release_0.70 debug1: Local version string SSH-2.0-OpenSSH_7.4 debug1: Enabling compatibility mode for protocol 2.0 debug1: SELinux support enabled [preauth] debug1: permanently_set_uid: 74/74 [preauth] debug1: list_hostkey_types: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519 [preauth] debug1: SSH2_MSG_KEXINIT sent [preauth] I tried with other windows based ssh clients (MobaXterm) and the same issue happens. I discussed this with people in the openssh mailing list, and they said this issue could most probably be caused by a path MTU/fragmentation problem... Then I moved my centos1 qcow2 image in another physical server called server2, with exactly the same hw specs and network connections, where I have installed an all-in-one OpenStack Pike. The network would be managed with neutron in this case, however I have configured neutron exactly so that the centos1 VM interface connects through a linux bridge (managed by neutron) to the server1 physical network interface, like in the libvirt case. # brctl show bridge name bridge id STP enabled interfaces brqa13eec69-a4 8000.0e7faabad6d4 no eno1 tap8cb53db0-fb tapb24a1cc5-20 Above the tapb24a1cc5-20 is the tap interface towards my centos1 VM. In this case, the Putty issue is gone, and I do not have any issue anymore. If I go back to the libvirt environment in server1, I get the same issue again. So I tend to think that my ssh connection issue is caused by the libvirt and the way networking is configured, however I do not know how to troubleshoot this further anymore. Any help is greatly appreciated. Adrian _______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Adrian Pascalau
2018-Jan-27 14:35 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sat, Jan 27, 2018 at 3:44 PM, Peter Crowther <peter.crowther@melandra.com> wrote:> You say you can ping but not ssh. If you install tcpdump on the VM, can you > see the ping packets arriving and leaving? If not, I suspect an address > collision - especially if ping continues to work with the VM shut down. If > you can't ping, check the other end of your bridge. I'm more familiar with > open vSwitch, but I'm somewhat concerned that your bridge definition doesn't > include a physical NIC as one of its connections. >Peter, thanks for your reply. Yes, I see the icmp request coming into the cnetos1 VM and the icmp reply going out. I am sure this is not an ip address collision. The bridge in the server1 libvirt environment is created like this: # cat /etc/sysconfig/network-scripts/ifcfg-eno1 DEVICE=eno1 BOOTPROTO=none BRIDGE=br0 ONBOOT=YES # cat /etc/sysconfig/network-scripts/ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static IPADDR=x.x.219.54 NETMASK=255.255.255.0 GATEWAY=x.x.219.1 ONBOOT=YES The result of the above is the following: # brctl show bridge name bridge id STP enabled interfaces br0 8000.fc15b4137688 no eno1 Then I define the above br0 bridge in libvirt, like below: # virsh net-dumpxml br0 <network> <name>br0</name> <uuid>5aaf72a5-023d-4b84-9d7c-d68b0918f620</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-list Name State Autostart Persistent ---------------------------------------------------------- br0 active no yes As soon as I have the br0 bridge defined in libvirt, I start the centos1 VM, that has eth0 interface connected to this br0 bridge: # virsh dumpxml centos1 [...] <interface type='network'> <mac address='52:54:00:40:31:85'/> <source network='br0'/> <model type='e1000'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> [...] # brctl show bridge name bridge id STP enabled interfaces br0 8000.fc15b4137688 no eno1 vnet0 And that is all. With this setup I have the centos1 VM interface eth0 directly connected to the br0 bridge through the vnet0 tap interface. The br0 bridge is also connected to the eno1 physical interface in server1, so my centos1 VM should be accessible to the outside world. However, I have the ssh issue described in my initial email, while ping is working. In the openssh-server debug log, I see the ssh connection established and later hanging with the last debug message being "debug1: SSH2_MSG_KEXINIT sent [preauth]". Am I doing something wrong with my libvirt setup above?
Adrian Pascalau
2018-Jan-28 17:07 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sat, Jan 27, 2018 at 3:44 PM, Peter Crowther <peter.crowther@melandra.com> wrote:> You say you can ping but not ssh. If you install tcpdump on the VM, can you > see the ping packets arriving and leaving? If not, I suspect an address > collision - especially if ping continues to work with the VM shut down. If > you can't ping, check the other end of your bridge. I'm more familiar with > open vSwitch, but I'm somewhat concerned that your bridge definition doesn't > include a physical NIC as one of its connections. >Ok, so I have investigated a bit further by doing some tcpdump and wireshark traces, as you suggested, and here is what I have found: When an Ethernet frame that is less then 60 bytes in size goes through the network, it is padded with 0x00 bytes until it has 60 bytes in length (64 with the frame check sequence). When this kind of padded frames goes from centos1 VM through the linux bridge br0 to the windows host, the IP and TCP headers in those frames wrongly consider the 0x00 padded bytes as part of the user data, therefore the upstream protocol (SSH in my case) tries to interpret them, and this is why Putty hangs. Those 0x00 padded bytes are at the layer2 Ethernet frame level, and should not be considered in the user data of the higher level protocols. About the padding bytes I have found some info here: https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths The flow in my environment is like this: [windows host]<---->[server1 host br0(eno1,vnet0)]<---->[eth0 centos1 VM] All above hosts are in the same subnet, so no routers in between. Server1 has the br0 linux bridge in forwarding mode that connects eno1 physical interface with the vnet0 tap interface. The vnet0 tap interface is connected to the centos1 VM eth0 interface. When I (1) ssh from the windows host to server1, no issue here. When I (2) ssh from the same windows host to the centos1 VM, so I go through the br0 bridge, I have this ssh issue I have mentioned. So I took several tcpdump traces, and compared the working ones with the non working ones, and this is the conclusion. So at this stage, everything points to the linux bridge, since in the working scenario (1) those 0x00 padding bytes are left alone and not considered in the user data of the IP and TCP protocols.
Adrian Pascalau
2018-Jan-29 09:44 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sun, Jan 28, 2018 at 7:07 PM, Adrian Pascalau <adrian27oradea@gmail.com> wrote:> When an Ethernet frame that is less then 60 bytes in size goes through > the network, it is padded with 0x00 bytes until it has 60 bytes in > length (64 with the frame check sequence). When this kind of padded > frames goes from centos1 VM through the linux bridge br0 to the > windows host, the IP and TCP headers in those frames wrongly consider > the 0x00 padded bytes as part of the user data, therefore the upstream > protocol (SSH in my case) tries to interpret them, and this is why > Putty hangs. Those 0x00 padded bytes are at the layer2 Ethernet frame > level, and should not be considered in the user data of the higher > level protocols.Ok, so I found a workaround for this, even if I do not know who caused this issue. Basically I noticed that I have this ssh connection issue only when the ssh client runs on a windows host. If the ssh client runs on a linux host, the ssh connection works without any problem. So I have compared the tcpdump for ssh connections initiated from both windows and linux, and what I have noticed is that on centos linux, by default the TCP stack uses timestamps in the TCP Options, and because of this, the Ethernet frames are never below 60 bytes, while in my windows the TCP Options timestamps are not used, and therefore some Ethernet frames are less than 60 bytes. So I enabled the TCP Options timestamps in windows as well, by running the command 'netsh int tcp set global timestamps=enabled', and just like that, the ssh started to work. Still, I do not know who is causing this issue, and who to blame for this behavior... Any suggestion how to identify which network element wrongly assigns the Ethernet padding to the TCP payload is more than welcome.
Apparently Analagous Threads
- issue with openssh-server running in a libvirt based centos virtual machine
- Re: issue with openssh-server running in a libvirt based centos virtual machine
- debian 10, vm cant connect to the host bridge
- Libvirt resume guest startup issues centos 7
- kvm libvirt vms import