Philipp Hahn
2011-Feb-25 13:40 UTC
[Xen-users] RFH: Windows2003+GPLPV packet-receive breaks after some time (Xen 3.4.3 amd64)
Hello, one of our domU Windows system with GPL-PV driver regularly has problems with its network connection: After some time the VM does not receive any packets anymore. It''s seems to be only a problem with receiving, since sending ARP packets still works: tcpdump -i vif147.0 -n arp | grep -FA1 --color XXX.X.71.77 If I try to ping the domU from the dom0, I only see the request going to the domU, but no answer: 13:49:17.106405 arp who-has XXX.X.71.77 tell XXX.X.12.47 If I try to ping some host from the domU, I see the request leaving the domU and the answer arriving for the domU, but no following ICMP messaged: 13:48:37.569618 arp who-has XXX.X.22.12 tell XXX.X.71.77 13:48:37.570002 arp reply XXX.X.22.12 is-at 00:16:3e:aa:ed:fa We have saved the state of the VM to a file, which when restored puts the domU back in the broken state. We collected some information, but now are stuck on how to best proceed, since we don''t know enough of Xens and GPLPVs internal working. Can we (or someone els) diagnose, why received packages are not properly handled? Should we install the debug driver and what should we do when the problem next occurs. (I''m not afraid of debuggers and assembler, but only on Linux and not much with Windows) Arch: amd64 Xen: 3.4.3 dom0: 2.6.32-17 (Debian) domU: Windows 2003 Service Pack 2 GPLPV: 0.11.0.238 Xen network device settings: Check checksum on RX packets: Enabled Checksum Offload: Enabled Large Send Offload: 61440 Locally Administrated Address: Not set MTU: 1500 Rx Interrupt Moderation (beta): Disabled Scatter/Gather: Enabled # xm network-list 147 Idx BE MAC Addr. handle state evt-ch tx-/rx-ring-ref BE-path 0 0 00:16:3e:af:fa:a5 0 4 9 15732/15741 /local/domain/0/backend/vif/147/0 # xenstore-ls /local/domain/0/backend/vif/147 0 = "" bridge = "XXXXXX0" domain = "XXXX010" handle = "0" uuid = "c550619d-3a4f-edfd-a22c-4b11a84b5728" script = "/etc/xen/scripts/vif-bridge" state = "4" frontend = "/local/domain/147/device/vif/0" mac = "00:16:3e:af:fa:a5" online = "1" frontend-id = "147" feature-sg = "1" feature-gso-tcpv4 = "1" feature-rx-copy = "1" feature-rx-flip = "0" feature-smart-poll = "1" hotplug-status = "connected" # netstat -s Ip: 2982289885 total packets received 2708867 with invalid addresses 0 forwarded 0 incoming packets discarded 2931918645 incoming packets delivered 1504949163 requests sent out 1 outgoing packets dropped 1 dropped because of missing route 2683 reassemblies required 1137 packets reassembled ok 2630 fragments received ok 5669 fragments created Icmp: 811639 ICMP messages received 400 input ICMP message failed. ICMP-Eingabehistogramm: destination unreachable: 4632 redirects: 619 echo requests: 806046 echo replies: 185 timestamp request: 44 address mask request: 67 876674 ICMP messages sent 0 ICMP messages failed ICMP-Ausgabehistogramm: destination unreachable: 70142 echo request: 452 echo replies: 806036 timestamp replies: 44 IcmpMsg: InType0: 185 InType3: 4632 InType5: 619 InType8: 806046 InType10: 2 InType13: 44 InType17: 67 InType37: 44 OutType0: 806036 OutType3: 70142 OutType8: 452 OutType14: 44 Tcp: 537972 active connections openings 141489 passive connection openings 3940 failed connection attempts 90207 connection resets received 9 connections established 2877624333 segments received 1502888618 segments send out 378040 segments retransmited 0 bad segments received. 703715 resets sent Udp: 132980 packets received 69241 packets to unknown port received. 0 packet receive errors 804134 packets sent UdpLite: TcpExt: 56 resets received for embryonic SYN_RECV sockets 514936 TCP sockets finished time wait in fast timer 35 time wait sockets recycled by time stamp 8541829 delayed acks sent 7168 delayed acks further delayed because of locked socket Quick ack mode was activated 52 times 1019468 packets directly queued to recvmsg prequeue. 2562618 bytes directly in process context from backlog 623450772 bytes directly received in process context from prequeue 1470140290 packet headers predicted 431554 packets header predicted and directly queued to user 6735803 acknowledgments not containing data payload received 1939166757 predicted acknowledgments 61400 times recovered from packet loss by selective acknowledgements Detected reordering 1 times using FACK 1 congestion windows fully recovered without slow start 5 congestion windows partially recovered using Hoe heuristic 691 congestion windows recovered without slow start after partial ack 302881 TCP data loss events TCPLostRetransmit: 6025 719 timeouts after SACK recovery 2 timeouts in loss state 347805 fast retransmits 19845 forward retransmits 5795 retransmits in slow start 3332 other TCP timeouts 94 SACK retransmits failed 77 DSACKs sent for old packets 42 DSACKs received 65852 connections reset due to unexpected data 87707 connections reset due to early user close 3 connections aborted due to timeout TCP ran low on memory 1 times TCPDSACKIgnoredOld: 40 TCPDSACKIgnoredNoUndo: 2 TCPSpuriousRTOs: 1 TCPSackShifted: 1746475 TCPSackMerged: 379798 TCPSackShiftFallback: 115430 IpExt: InMcastPkts: 880 InBcastPkts: 53279804 InOctets: -1233700460 OutOctets: 753104249 InMcastOctets: 26566 InBcastOctets: 1189547847 Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ ** Besuchen Sie uns auf der CeBIT in Hannover ** ** Auf dem Univention Stand D36 in Halle 2 ** ** Vom 01. bis 05. März 2011 ** _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2011-Feb-26 00:47 UTC
RE: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks after sometime (Xen 3.4.3 amd64)
> Hello, > > one of our domU Windows system with GPL-PV driver regularly hasproblems> with its network connection: After some time the VM does not receiveany> packets anymore. It''s seems to be only a problem with receiving, sincesending> ARP packets still works: > > tcpdump -i vif147.0 -n arp | grep -FA1 --color XXX.X.71.77 > > If I try to ping the domU from the dom0, I only see the request goingto the> domU, but no answer: > 13:49:17.106405 arp who-has XXX.X.71.77 tell XXX.X.12.47 > > If I try to ping some host from the domU, I see the request leavingthe domU> and the answer arriving for the domU, but no following ICMP messaged: > 13:48:37.569618 arp who-has XXX.X.22.12 tell XXX.X.71.77 > 13:48:37.570002 arp reply XXX.X.22.12 is-at 00:16:3e:aa:ed:fa > > We have saved the state of the VM to a file, which when restored putsthe domU> back in the broken state. > > We collected some information, but now are stuck on how to bestproceed, since> we don''t know enough of Xens and GPLPVs internal working. > Can we (or someone els) diagnose, why received packages are notproperly> handled? > Should we install the debug driver and what should we do when theproblem next> occurs. (I''m not afraid of debuggers and assembler, but only on Linuxand not> much with Windows) >Do you have any Linux PV domains? If you install the debug version of the driver then you''ll get info written to /var/log/xen/qemu-dm-<domUname>.log which might show something useful Also, try turning off all the offload functions in the advanced properties of the network adapter under Linux. Does your Dom0 have any GRE tunnels? I have seen problems when these are used before, but that''s a Dom0 routing interaction with checksum offloading. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Philipp Hahn
2011-Feb-28 08:04 UTC
Re: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks after sometime (Xen 3.4.3 amd64)
Hello James, thanks for your fast answer. Am Samstag 26 Februar 2011 01:47:23 schrieb James Harper:> Do you have any Linux PV domains?I don''t understand, where a Linux PV domains fits in here, since the problematic domU is a Windows Domain. Is this for cross-testing PV problems?> If you install the debug version of the driver then you''ll get info > written to /var/log/xen/qemu-dm-<domUname>.log which might show > something useful > > Also, try turning off all the offload functions in the advanced > properties of the network adapter under Linux.Will try.> Does your Dom0 have any GRE tunnels? I have seen problems when these are > used before, but that''s a Dom0 routing interaction with checksum > offloading.Not that I know off. Is it possible to detect, that these errors? What I find strange is that the error occurs only after some time, after everything worked fine. The occurrence of the error might be corelated to some high network traffic load, when the network backup starts. Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ ** Besuchen Sie uns auf der CeBIT in Hannover ** ** Auf dem Univention Stand D36 in Halle 2 ** ** Vom 01. bis 05. März 2011 ** _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2011-Feb-28 11:10 UTC
RE: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks aftersometime (Xen 3.4.3 amd64)
> > Hello James, > > thanks for your fast answer. > > Am Samstag 26 Februar 2011 01:47:23 schrieb James Harper: > > Do you have any Linux PV domains? > > I don''t understand, where a Linux PV domains fits in here, since the > problematic domU is a Windows Domain. Is this for cross-testing PVproblems? Yes. If the problem occurs in a Linux PV domain (or even a Linux HVM domain with PV drivers) then it rules GPLPV out as a problem> > > If you install the debug version of the driver then you''ll get info > > written to /var/log/xen/qemu-dm-<domUname>.log which might show > > something useful > > > > Also, try turning off all the offload functions in the advanced > > properties of the network adapter under Linux. > > Will try. > > > Does your Dom0 have any GRE tunnels? I have seen problems when theseare> > used before, but that''s a Dom0 routing interaction with checksum > > offloading. > > Not that I know off. Is it possible to detect, that these errors? > What I find strange is that the error occurs only after some time,after> everything worked fine. The occurrence of the error might be corelatedto> some high network traffic load, when the network backup starts. >With offload functions enabled I have seen these problems in conjunction with GRE tunnels but not on LAN traffic and not with offload functions disabled. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Philipp Hahn
2011-Mar-07 10:16 UTC
Re: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks aftersometime (Xen 3.4.3 amd64)
Hello James, hello List, thank so far for your support Am Montag 28 Februar 2011 12:10:29 schrieb James Harper:> Yes. If the problem occurs in a Linux PV domain (or even a Linux HVM > domain with PV drivers) then it rules GPLPV out as a problemThe problem has only been observed on Windows VMs with GplPv, never on any Linux VM or on Windows VMs without GplPv (as far as I know). Not all Windows VMs show the described behavior, and it takes some to to occur, normally corelated to the nightly network backup. The problem seems to exists since a long time: we have reports of problems going back as far as versions 0.9x of the GplPv driver.> > > If you install the debug version of the driver then you''ll get info > > > written to /var/log/xen/qemu-dm-<domUname>.log which might show > > > something useful > > > > > > Also, try turning off all the offload functions in the advanced > > > properties of the network adapter under Linux.Okay, the debug version (GPLPV 0.10.0.238) is now installed and it shows the following messages: # grep XenNet qemu-dm-xnts010.log XenNet --> DriverEntry XenNet DriverObject = 8A787778, RegistryPath = 8A822000 XenNet NdisGetVersion = 50002 XenNet ndis_wrapper_handle = 00000000 XenNet ndis_wrapper_handle = 8A814C00 XenNet NdisMInitializeWrapper succeeded XenNet MajorNdisVersion = 5, MinorNdisVersion = 1 XenNet about to call NdisMRegisterMiniport XenNet called NdisMRegisterMiniport XenNet <-- DriverEntry XenNet --> XenNet_Init XenNet IRQL = 0 XenNet nrl_length = 40 XenNet irq_vector = 01c, irq_level = 01c, irq_mode = NdisInterruptLevelSensitive XenNet XEN_INIT_TYPE_13 XenNet XEN_INIT_TYPE_VECTORS XenNet XEN_INIT_TYPE_DEVICE_STATE - 8A9F8FB4 XenNet --> XenNet_D0Entry XenNet --> XenNet_ConnectBackend XenNet XEN_INIT_TYPE_13 XenNet XEN_INIT_TYPE_VECTORS XenNet XEN_INIT_TYPE_DEVICE_STATE - 8A9F8FB4 XenNet XEN_INIT_TYPE_RING - tx-ring-ref = 8A6CD000 XenNet XEN_INIT_TYPE_RING - rx-ring-ref = 8A6CC000 XenNet XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 9 XenNet XEN_INIT_TYPE_READ_STRING - mac = 00:16:3e:af:fa:a5 XenNet XEN_INIT_TYPE_READ_STRING - feature-sg = 1 XenNet XEN_INIT_TYPE_READ_STRING - feature-gso-tcpv4 = 1 XenNet XEN_INIT_TYPE_17 XenNet <-- XenNet_ConnectBackend XenNet --> XenNet_RxInit XenNet <-- XenNet_RxInit XenNet <-- XenNet_D0Entry XenNet --> XenNet_PnPEventNotify XenNet NdisDevicePnPEventPowerProfileChanged XenNet <-- XenNet_PnPEventNotify XenNet (BUFFER_TOO_SHORT 100 > 28) XenNet (BUFFER_TOO_SHORT 152 > 0) XenNet (BUFFER_TOO_SHORT 152 > 0) XenNet cannot allocate packet XenNet No free packets XenNet Ran out of packets The last three messages are repeated multiple times. (I can send you the full log per private Email, if you want to take a look.) Since it might be related: /sys/class/net/vif205.0/ shows the following statistics/, where I find the number of tx_dropped unsettling: ./statistics/rx_packets:242028431 ./statistics/tx_packets:170064873 ./statistics/rx_bytes:340462359805 ./statistics/tx_bytes:19457838604 ./statistics/rx_errors:0 ./statistics/tx_errors:0 ./statistics/rx_dropped:0 ./statistics/tx_dropped:1349522 ./statistics/multicast:0 ./statistics/collisions:0 ./statistics/rx_length_errors:0 ./statistics/rx_over_errors:0 ./statistics/rx_crc_errors:0 ./statistics/rx_frame_errors:0 ./statistics/rx_fifo_errors:0 ./statistics/rx_missed_errors:0 ./statistics/tx_aborted_errors:0 ./statistics/tx_carrier_errors:0 ./statistics/tx_fifo_errors:0 ./statistics/tx_heartbeat_errors:0 ./statistics/tx_window_errors:0 ./statistics/rx_compressed:0 ./statistics/tx_compressed:0 I also noticed the following message, which I can''t put into any context: # tail -f /var/log/xen/xend-debug.log xc_map_foreign_range: ioctl failed: Bad address Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2011-Mar-07 10:58 UTC
RE: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks aftersometime (Xen 3.4.3 amd64)
> XenNet <-- XenNet_PnPEventNotify > XenNet (BUFFER_TOO_SHORT 100 > 28) > XenNet (BUFFER_TOO_SHORT 152 > 0) > XenNet (BUFFER_TOO_SHORT 152 > 0) > XenNet cannot allocate packet > XenNet No free packets > XenNet Ran out of packets > > The last three messages are repeated multiple times. > > (I can send you the full log per private Email, if you want to take alook.)>Probably not useful to send the full log, I think you''ve definitely identified a leak. Strange that I''ve never seen it before... I have several DomU''s with several different versions of GPLPV with several different combinations of checksum and large send offload enabled and disabled, and some of them have been up for months. Did you try with the offload features disabled?> ./statistics/tx_dropped:1349522The messages you are seeing above are in the rx path in DomU which means the tx path in Dom0. Do your DomU''s receive a large amount of traffic? Most of my traffic would be in the other direction, and ->DomU traffic would be mostly at WAN speeds, not LAN speeds... I''ll have a look at the code and see if I''ve missed something. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Philipp Hahn
2011-Mar-07 13:04 UTC
Re: [Xen-users] RFH: Windows2003+GPLPV packet-receive breaks aftersometime (Xen 3.4.3 amd64)
Hello James, thanks again for the prompt answer, Am Montag 07 März 2011 11:58:50 schrieb James Harper:> > XenNet <-- XenNet_PnPEventNotify > > XenNet (BUFFER_TOO_SHORT 100 > 28) > > XenNet (BUFFER_TOO_SHORT 152 > 0) > > XenNet (BUFFER_TOO_SHORT 152 > 0) > > XenNet cannot allocate packet > > XenNet No free packets > > XenNet Ran out of packets > > > > The last three messages are repeated multiple times. > > > > (I can send you the full log per private Email, if you want to take a > > look.)There were some more messages related to networking, which my grep missed: XenNet XEN_INIT_TYPE_DEVICE_STATE - 8A9F8FB4 ScatterGather = 1 LargeSendOffload = 61440 ChecksumOffload = 1 ChecksumOffloadRxCheck = 1 MTU = 1500 RxInterruptModeration = 0 Could not read NetworkAddress value (c0000001) or value is invalid XenNet --> XenNet_D0Entry ... XenNet <-- XenNet_D0Entry Get Unknown OID 0x10202 Get Unknown OID 0x10203 XenNet --> XenNet_PnPEventNotify XenNet NdisDevicePnPEventPowerProfileChanged XenNet <-- XenNet_PnPEventNotify Get Unknown OID 0x10201 Get Unknown OID 0xfc010210 Get OID_TCP_TASK_OFFLOAD XenNet (BUFFER_TOO_SHORT 100 > 28) Get OID_TCP_TASK_OFFLOAD config_csum enabled nto = 8A4141A4 nto->Size = 24 nto->TaskBufferLength = 16 config_gso enabled nto = 8A4141C8 nto->Size = 24 nto->TaskBufferLength = 16 &(nttls->IpOptions) = 8A4141E9 Set OID_TCP_TASK_OFFLOAD TcpIpChecksumNdisTask V4Transmit.IpOptionsSupported = 0 V4Transmit.TcpOptionsSupported = 1 V4Transmit.TcpChecksum = 1 V4Transmit.UdpChecksum = 0 V4Transmit.IpChecksum = 0 V4Receive.IpOptionsSupported = 0 V4Receive.TcpOptionsSupported = 0 V4Receive.TcpChecksum = 1 V4Receive.UdpChecksum = 0 V4Receive.IpChecksum = 0 V6Transmit.IpOptionsSupported = 0 V6Transmit.TcpOptionsSupported = 0 V6Transmit.TcpChecksum = 0 V6Transmit.UdpChecksum = 0 V6Receive.IpOptionsSupported = 0 V6Receive.TcpOptionsSupported = 0 V6Receive.TcpChecksum = 0 V6Receive.UdpChecksum = 0 TcpLargeSendNdisTask MaxOffLoadSize = 61440 MinSegmentCount = 4 TcpOptions = 0 IpOptions = 0 Get OID_PNP_CAPABILITIES Set Unknown OID 0x10119 Set OID_GEN_CURRENT_LOOKAHEAD 128 (8A6CE000) Set OID_GEN_CURRENT_PACKET_FILTER (xi = 8A6CE000) NDIS_PACKET_TYPE_DIRECTED NDIS_PACKET_TYPE_MULTICAST NDIS_PACKET_TYPE_BROADCAST Get Unknown OID 0x10203 XenNet (BUFFER_TOO_SHORT 152 > 0) Get Unknown OID 0x10117 XenVbd SCSIOP_MODE_SENSE llbaa = 0, dbd = 0, page_code = 63, allocation_length = 12 XenPCI --> XenPci_EvtDeviceUsageNotification> Did you try with the offload features disabled?Uups: Because of the switch to the debugging drivers, those features were re-enabled. We just disabled them again, which also unblocked the domain for now. We''ll monitor those domains for some time and see, if the problem re-apprears.> > ./statistics/tx_dropped:1349522 > > The messages you are seeing above are in the rx path in DomU which means > the tx path in Dom0. Do your DomU''s receive a large amount of traffic?Both systems currently showing the problem do send 10 times more data then they receive, but that might still be above your average test case: # ifconfig vif205.0 | tail -n 2 # Linuxs point of view RX bytes:361147609996 (336.3 GiB) TX bytes:20727355262 (19.3 GiB) RX bytes:1292788783876 (1.1 TiB) TX bytes:170447442659 (158.7 GiB)> Most of my traffic would be in the other direction, and ->DomU traffic > would be mostly at WAN speeds, not LAN speeds... I''ll have a look at the > code and see if I''ve missed something.Thanks again. Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users