Peter Crowther
2018-Jan-27 13:44 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
You say you can ping but not ssh. If you install tcpdump on the VM, can you
see the ping packets arriving and leaving? If not, I suspect an address
collision - especially if ping continues to work with the VM shut down. If
you can't ping, check the other end of your bridge. I'm more familiar
with
open vSwitch, but I'm somewhat concerned that your bridge definition
doesn't include a physical NIC as one of its connections.
Peter
On 27 Jan 2018 1:13 p.m., "Adrian Pascalau"
<adrian27oradea@gmail.com>
wrote:
Hi,
I have a strange issue in a libvirt environment, and I do not know how
to solve it.
I have two centos hosts: first one is a physical server called
server1, that acts as a host for the second one, called centos1. The
centos1 is a virtual machine (VM) running in server1. A linux bridge
in forwarding mode is used to connect the centos1 VM network interface
to the server1 network interface and to the external network. The
centos1 VM and the linux bridge are managed with libvirt (well, the
bridge itself in this case is created manually).
# virsh net-dumpxml br0
<network connections='1'>
<name>br0</name>
<uuid>5aaf72a5-023d-4b84-9d7c-d68b0918f620</uuid>
<forward mode='bridge'/>
<bridge name='br0'/>
</network>
# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.fc15b4137688 no eno1
vnet0
Both server1 and centos1 have IP addresses in the same subnet, and
both are reachable with ping from every other host in my network. In
both server1 and centos1, the openssh-server configuration in
/etc/ssh/sshd_config is the default one, and has not been changed.
When I ssh with Putty to the physical server server1 IP address,
everything works as expected: I get a login prompt, I enter my
password and I log in.
However, when I use Putty to connect to the centos1 VM, I do not get a
login prompt whatsoever. So I think there might be some issue in
between the server1 physical interface and my centos VM.
I used openssh-server in debug mode, so see where the ssh connection
hangs, and here is what I get:
[...]
debug1: Server will not fork when running in debugging mode.
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: sshd version OpenSSH_7.4, OpenSSL 1.0.2k-fips 26 Jan 2017
debug1: private host key #0: ssh-rsa
SHA256:pEuFQsodwK+0PoRzbVRba1ahHLEpwp8DG2KGQmxOGJk
debug1: private host key #1: ecdsa-sha2-nistp256
SHA256:F6HrSNWZhYaU7LMweI+RBviqTCHcTYyMBGPDz5OjT4c
debug1: private host key #2: ssh-ed25519
SHA256:aG3V6jjPHXUnNeavbxT/xozqrb5q3yWDkkAmXBCdnGk
debug1: inetd sockets after dupping: 3, 3
Connection from x.x.x.181 port 49436 on x.x.x.115 port 22
debug1: Client protocol version 2.0; client software version
PuTTY_Release_0.70
debug1: no match: PuTTY_Release_0.70
debug1: Local version string SSH-2.0-OpenSSH_7.4
debug1: Enabling compatibility mode for protocol 2.0
debug1: SELinux support enabled [preauth]
debug1: permanently_set_uid: 74/74 [preauth]
debug1: list_hostkey_types:
ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
[preauth]
debug1: SSH2_MSG_KEXINIT sent [preauth]
I tried with other windows based ssh clients (MobaXterm) and the same
issue happens. I discussed this with people in the openssh mailing
list, and they said this issue could most probably be caused by a path
MTU/fragmentation problem...
Then I moved my centos1 qcow2 image in another physical server called
server2, with exactly the same hw specs and network connections, where
I have installed an all-in-one OpenStack Pike. The network would be
managed with neutron in this case, however I have configured neutron
exactly so that the centos1 VM interface connects through a linux
bridge (managed by neutron) to the server1 physical network interface,
like in the libvirt case.
# brctl show
bridge name bridge id STP enabled interfaces
brqa13eec69-a4 8000.0e7faabad6d4 no eno1
tap8cb53db0-fb
tapb24a1cc5-20
Above the tapb24a1cc5-20 is the tap interface towards my centos1 VM.
In this case, the Putty issue is gone, and I do not have any issue
anymore. If I go back to the libvirt environment in server1, I get the
same issue again.
So I tend to think that my ssh connection issue is caused by the
libvirt and the way networking is configured, however I do not know
how to troubleshoot this further anymore.
Any help is greatly appreciated.
Adrian
_______________________________________________
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users
Adrian Pascalau
2018-Jan-27 14:35 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sat, Jan 27, 2018 at 3:44 PM, Peter Crowther <peter.crowther@melandra.com> wrote:> You say you can ping but not ssh. If you install tcpdump on the VM, can you > see the ping packets arriving and leaving? If not, I suspect an address > collision - especially if ping continues to work with the VM shut down. If > you can't ping, check the other end of your bridge. I'm more familiar with > open vSwitch, but I'm somewhat concerned that your bridge definition doesn't > include a physical NIC as one of its connections. >Peter, thanks for your reply. Yes, I see the icmp request coming into the cnetos1 VM and the icmp reply going out. I am sure this is not an ip address collision. The bridge in the server1 libvirt environment is created like this: # cat /etc/sysconfig/network-scripts/ifcfg-eno1 DEVICE=eno1 BOOTPROTO=none BRIDGE=br0 ONBOOT=YES # cat /etc/sysconfig/network-scripts/ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static IPADDR=x.x.219.54 NETMASK=255.255.255.0 GATEWAY=x.x.219.1 ONBOOT=YES The result of the above is the following: # brctl show bridge name bridge id STP enabled interfaces br0 8000.fc15b4137688 no eno1 Then I define the above br0 bridge in libvirt, like below: # virsh net-dumpxml br0 <network> <name>br0</name> <uuid>5aaf72a5-023d-4b84-9d7c-d68b0918f620</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-list Name State Autostart Persistent ---------------------------------------------------------- br0 active no yes As soon as I have the br0 bridge defined in libvirt, I start the centos1 VM, that has eth0 interface connected to this br0 bridge: # virsh dumpxml centos1 [...] <interface type='network'> <mac address='52:54:00:40:31:85'/> <source network='br0'/> <model type='e1000'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> [...] # brctl show bridge name bridge id STP enabled interfaces br0 8000.fc15b4137688 no eno1 vnet0 And that is all. With this setup I have the centos1 VM interface eth0 directly connected to the br0 bridge through the vnet0 tap interface. The br0 bridge is also connected to the eno1 physical interface in server1, so my centos1 VM should be accessible to the outside world. However, I have the ssh issue described in my initial email, while ping is working. In the openssh-server debug log, I see the ssh connection established and later hanging with the last debug message being "debug1: SSH2_MSG_KEXINIT sent [preauth]". Am I doing something wrong with my libvirt setup above?
Adrian Pascalau
2018-Jan-28 17:07 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sat, Jan 27, 2018 at 3:44 PM, Peter Crowther <peter.crowther@melandra.com> wrote:> You say you can ping but not ssh. If you install tcpdump on the VM, can you > see the ping packets arriving and leaving? If not, I suspect an address > collision - especially if ping continues to work with the VM shut down. If > you can't ping, check the other end of your bridge. I'm more familiar with > open vSwitch, but I'm somewhat concerned that your bridge definition doesn't > include a physical NIC as one of its connections. >Ok, so I have investigated a bit further by doing some tcpdump and wireshark traces, as you suggested, and here is what I have found: When an Ethernet frame that is less then 60 bytes in size goes through the network, it is padded with 0x00 bytes until it has 60 bytes in length (64 with the frame check sequence). When this kind of padded frames goes from centos1 VM through the linux bridge br0 to the windows host, the IP and TCP headers in those frames wrongly consider the 0x00 padded bytes as part of the user data, therefore the upstream protocol (SSH in my case) tries to interpret them, and this is why Putty hangs. Those 0x00 padded bytes are at the layer2 Ethernet frame level, and should not be considered in the user data of the higher level protocols. About the padding bytes I have found some info here: https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths The flow in my environment is like this: [windows host]<---->[server1 host br0(eno1,vnet0)]<---->[eth0 centos1 VM] All above hosts are in the same subnet, so no routers in between. Server1 has the br0 linux bridge in forwarding mode that connects eno1 physical interface with the vnet0 tap interface. The vnet0 tap interface is connected to the centos1 VM eth0 interface. When I (1) ssh from the windows host to server1, no issue here. When I (2) ssh from the same windows host to the centos1 VM, so I go through the br0 bridge, I have this ssh issue I have mentioned. So I took several tcpdump traces, and compared the working ones with the non working ones, and this is the conclusion. So at this stage, everything points to the linux bridge, since in the working scenario (1) those 0x00 padding bytes are left alone and not considered in the user data of the IP and TCP protocols.
Adrian Pascalau
2018-Jan-29 09:44 UTC
Re: [libvirt-users] issue with openssh-server running in a libvirt based centos virtual machine
On Sun, Jan 28, 2018 at 7:07 PM, Adrian Pascalau <adrian27oradea@gmail.com> wrote:> When an Ethernet frame that is less then 60 bytes in size goes through > the network, it is padded with 0x00 bytes until it has 60 bytes in > length (64 with the frame check sequence). When this kind of padded > frames goes from centos1 VM through the linux bridge br0 to the > windows host, the IP and TCP headers in those frames wrongly consider > the 0x00 padded bytes as part of the user data, therefore the upstream > protocol (SSH in my case) tries to interpret them, and this is why > Putty hangs. Those 0x00 padded bytes are at the layer2 Ethernet frame > level, and should not be considered in the user data of the higher > level protocols.Ok, so I found a workaround for this, even if I do not know who caused this issue. Basically I noticed that I have this ssh connection issue only when the ssh client runs on a windows host. If the ssh client runs on a linux host, the ssh connection works without any problem. So I have compared the tcpdump for ssh connections initiated from both windows and linux, and what I have noticed is that on centos linux, by default the TCP stack uses timestamps in the TCP Options, and because of this, the Ethernet frames are never below 60 bytes, while in my windows the TCP Options timestamps are not used, and therefore some Ethernet frames are less than 60 bytes. So I enabled the TCP Options timestamps in windows as well, by running the command 'netsh int tcp set global timestamps=enabled', and just like that, the ssh started to work. Still, I do not know who is causing this issue, and who to blame for this behavior... Any suggestion how to identify which network element wrongly assigns the Ethernet padding to the TCP payload is more than welcome.
Seemingly Similar Threads
- Re: issue with openssh-server running in a libvirt based centos virtual machine
- Re: issue with openssh-server running in a libvirt based centos virtual machine
- issue with openssh-server running in a libvirt based centos virtual machine
- Re: issue with openssh-server running in a libvirt based centos virtual machine
- Re: avoiding PCI bus 8 / using PCI function / virt-install