Nathan March
2015-Apr-14 23:55 UTC
[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6
Hi All, Was troubleshooting some odd VM network issues and discovered that we're seeing dropped packets + retransmissions across multiple domU OS's and dom0 hardware platforms. xendev01 ~ # tshark -R "tcp.analysis.retransmission " -i vif7.0 Running as user "root" and group "root". This could be dangerous. Capturing on vif7.0 3.054257 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 110 [TCP Fast Retransmission] Encrypted response packet len=44 3.061949 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast Retransmission] Encrypted response packet len=1368 3.383880 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast Retransmission] Encrypted response packet len=1368 3.630911 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast Retransmission] Encrypted response packet len=1368 3.635964 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast Retransmission] Encrypted response packet len=1368 I've confirmed this is happening with linux, windows and pfsense (bsd) domU's. I've turned off every feature I can with ethtool on both the underlying bridge on the host, the vif's, and the eth's inside the domU's. I also see it on traffic inbetween vms on the same host. The domU sees packet errors on incoming traffic and outgoing looks fine, dumping on the dom0 indicates incoming packets are fine, but the reply from the domU is broken. This does not happen running the exact same VMs on some older xen 4.1.3 hosts. Reproduction is easy (for me at least), any burst of traffic will do it. I've just been running "ps auxf" over ssh to a vm to trigger. Since I'm seeing it on the host when I sniff the vif, this feels like a bug? - Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20150414/d7101e5d/attachment.html>
Nathan March
2015-Apr-15 20:12 UTC
[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6
Hi All, Some more data on this, I've reproduced this on another host that's a completely stock centos/xen deployment with a centos 6.6 domU. Since I?m seeing the retransmissions on the VIF, I don't think it's related to the network stack but just in case.. Each host is connected via LACP with vlan tagging to a pair of stacked cisco 3750's. Host networking config is here: http://dpaste.com/1Q6NY3Y The vm is on br99 here. This is easily reproducable by just generating a 250mb random file and doing an scp, while watching with tshark: tshark -R "tcp.analysis.retransmission" There's no visible impact to the connection the vast majority of the time, which is why I think this has gone unnoticed. Just to confirm this wasn't related to hardware / nics, I've reproduced this on: - Dell PowerEdge M620 with broadcom nics - Dell C6220 with intel nics - Supermicro X8DTT with intel nics Any ideas? =) - Nathan
Nathan March
2015-Apr-15 22:22 UTC
[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6
So I might have been misinterpreting things here and might be way off base. I think you can ignore this thread and I'll follow up if I get anything concrete down the road =) The retranmissions I'm seeing and reproducing are probably within normal allowances and can't reproduce the issue that originally lead me down this path. - Nathan> -----Original Message----- > From: centos-virt-bounces at centos.org [mailto:centos-virt- > bounces at centos.org] On Behalf Of Nathan March > Sent: Wednesday, April 15, 2015 1:13 PM > To: 'Discussion about the virtualization on CentOS' > Subject: Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest > 4.4.1-10el6 > > Hi All, > > Some more data on this, I've reproduced this on another host that's a > completely stock centos/xen deployment with a centos 6.6 domU. > > Since I?m seeing the retransmissions on the VIF, I don't think it's related to > the network stack but just in case.. Each host is connected via LACP with vlan > tagging to a pair of stacked cisco 3750's. Host networking config is here: > > http://dpaste.com/1Q6NY3Y > > The vm is on br99 here. > > This is easily reproducable by just generating a 250mb random file and doing > an scp, while watching with tshark: > > tshark -R "tcp.analysis.retransmission" > > There's no visible impact to the connection the vast majority of the time, > which is why I think this has gone unnoticed. > > Just to confirm this wasn't related to hardware / nics, I've reproduced this on: > > - Dell PowerEdge M620 with broadcom nics > - Dell C6220 with intel nics > - Supermicro X8DTT with intel nics > > Any ideas? =) > > - Nathan > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > http://lists.centos.org/mailman/listinfo/centos-virt