Hi, I had two ssh connection idle for some time, and suddenly the connection "broke", no responses from ping. Using 3c59x NIC. Kernel log: after calling `wm destroy ttylinux`: Dec 12 20:30:52 zirafa device vif3.0 left promiscuous mode Dec 12 20:30:52 zirafa xen-br0: port 2(vif3.0) entering disabled state Dec 12 20:30:52 zirafa xen-br0: port 2(vif3.0) entering disabled state Dec 12 20:33:23 zirafa xen-br0: port 1(eth0) entering disabled state Dec 12 20:34:23 zirafa xen-br0: port 1(eth0) entering learning state Dec 12 20:34:23 zirafa xen-br0: topology change detected, propagating Dec 12 20:34:23 zirafa xen-br0: port 1(eth0) entering forwarding state ...machine worked ok, I can see messages from cron in syslog... maybe this is the time I tried to press some key in ssh connection to domain0, +/- 1 minute or so: Dec 12 21:12:23 zirafa xen-br0: port 1(eth0) entering disabled state Dec 12 21:13:23 zirafa xen-br0: port 1(eth0) entering learning state Dec 12 21:13:23 zirafa xen-br0: topology change detected, propagating Dec 12 21:13:23 zirafa xen-br0: port 1(eth0) entering forwarding state (login from serial console to see what happens) Dec 12 21:13:31 zirafa login(pam_unix)[7594]: session opened for user root by (uid=0) (network connectivity restored) I have no idea why is there 1 minute delay, according to syslog. Network cable was *not* unplugged. BTW, this is what happens when I unplug cable from eth0 (unplugged at 21:27:50, re-plugged at 21:28:33): Dec 12 21:28:23 zirafa xen-br0: port 1(eth0) entering disabled state Dec 12 21:29:23 zirafa xen-br0: port 1(eth0) entering learning state Dec 12 21:29:23 zirafa xen-br0: topology change detected, propagating Dec 12 21:29:23 zirafa xen-br0: port 1(eth0) entering forwarding state Is it normal? j. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
So, network availability drops out even without unprivileged domains. I''m using default xen configuration, machines are on the same subnet (10.8.6.61 and 10.18.6.60), connected by switch. Does it matter that Gentoo (in current stable version) uses `ifconfig` and `route` to setup networking and /etc/xen/scripts/network uses iproute2 package? It shouldn''t, AFAIK. One thing that I don''t understand is that `ifconfig` shows that eth0 holds its IP address even after xen-br0 has been brought up (and xen-br0 has /32 netmask...): (lo removed from output) eth0 Link encap:Ethernet HWaddr 00:10:4B:B6:BD:0E inet addr:10.18.6.60 Bcast:10.18.6.63 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7484 errors:0 dropped:0 overruns:0 frame:0 TX packets:3407 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:872358 (851.9 Kb) TX bytes:481887 (470.5 Kb) Interrupt:11 Base address:0xe400 xen-br0 Link encap:Ethernet HWaddr 00:10:4B:B6:BD:0E inet addr:10.18.6.60 Bcast:10.18.6.63 Mask:255.255.255.255 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4554 errors:0 dropped:0 overruns:0 frame:0 TX packets:2630 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:397202 (387.8 Kb) TX bytes:389085 (379.9 Kb) jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> So, network availability drops out even without unprivileged domains. > I''m using default xen configuration, machines are on the same subnet > (10.8.6.61 and 10.18.6.60), connected by switch. > > Does it matter that Gentoo (in current stable version) uses `ifconfig` > and `route` to setup networking and /etc/xen/scripts/network uses > iproute2 package? It shouldn''t, AFAIK. > > One thing that I don''t understand is that `ifconfig` shows that eth0 > holds its IP address even after xen-br0 has been brought up (and xen-br0 > has /32 netmask...): > (lo removed from output)Are you using DHCP to set the eth0 address? I wander if the drop outs occur when the lease expires. Have you tried getting dhclient/dhcpcd to set the address for xen-br0 rather than eth0? Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt wrote:>>So, network availability drops out even without unprivileged domains. >>I''m using default xen configuration, machines are on the same subnet >>(10.8.6.61 and 10.18.6.60), connected by switch. >> >>Does it matter that Gentoo (in current stable version) uses `ifconfig` >>and `route` to setup networking and /etc/xen/scripts/network uses >>iproute2 package? It shouldn''t, AFAIK. >> >>One thing that I don''t understand is that `ifconfig` shows that eth0 >>holds its IP address even after xen-br0 has been brought up (and xen-br0 >>has /32 netmask...): >>(lo removed from output) > > > Are you using DHCP to set the eth0 address? I wander if the drop > outs occur when the lease expires. Have you tried getting > dhclient/dhcpcd to set the address for xen-br0 rather than eth0? > > Ianno, I''m setting it manually via `ifconfig` in init script. Notice the following lines in kernel log: Dec 12 20:33:23 zirafa xen-br0: port 1(eth0) entering disabled state Dec 12 20:34:23 zirafa xen-br0: port 1(eth0) entering learning state Why is eth0 (the only physical device in xen-br0) getting into disabled state? j. -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> no, I''m setting it manually via `ifconfig` in init script. Notice the > following lines in kernel log: > > Dec 12 20:33:23 zirafa xen-br0: port 1(eth0) entering disabled state > Dec 12 20:34:23 zirafa xen-br0: port 1(eth0) entering learning state > > Why is eth0 (the only physical device in xen-br0) getting into disabled > state?My guess would be that the bridge code is receiving carrier-change events from eth0. This causes it to put eth0 in disabled state for a while. One way to check this would be to add some printk()''s to net/bridge/br_notify.c and see whether you are getting NETDEV_CHANGE or NETDEV_DOWN events. If so, it may be that your physical connection, or your router/switch/hub, is a bit dodgy. None of the other paths via which the interface may get disabled seem very likely to occur, but we can look at those if it doesn''t appear that you are getting NETDEV events. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt wrote:>>One thing that I don''t understand is that `ifconfig` shows that eth0 >>holds its IP address even after xen-br0 has been brought up (and xen-br0 >>has /32 netmask...): >>(lo removed from output) > > > Are you using DHCP to set the eth0 address? I wander if the drop > outs occur when the lease expires. Have you tried getting > dhclient/dhcpcd to set the address for xen-br0 rather than eth0? > > IanIt really should be renegotiating the lease before it expires, and you can probably track that in the logs or via a trace. thanks, Nivedita ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> My guess would be that the bridge code is receiving carrier-change> events from eth0. This causes it to put eth0 in disabled state for a > while. > > One way to check this would be to add some printk()''s to > net/bridge/br_notify.c and see whether you are getting NETDEV_CHANGE > or NETDEV_DOWN events. If so, it may be that your physical connection, > or your router/switch/hub, is a bit dodgy. > > None of the other paths via which the interface may get disabled seem > very likely to occur, but we can look at those if it doesn''t appear > that you are getting NETDEV events. Sorry for delay, we had had some problems with our mailserver. I''ll try if I''m able to implement what you''re talking about :-) jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser wrote:>>no, I''m setting it manually via `ifconfig` in init script. Notice the >>following lines in kernel log: >> >>Dec 12 20:33:23 zirafa xen-br0: port 1(eth0) entering disabled state >>Dec 12 20:34:23 zirafa xen-br0: port 1(eth0) entering learning state >> >>Why is eth0 (the only physical device in xen-br0) getting into disabled >>state? > > > My guess would be that the bridge code is receiving carrier-change > events from eth0. This causes it to put eth0 in disabled state for a > while. > > One way to check this would be to add some printk()''s to > net/bridge/br_notify.c and see whether you are getting NETDEV_CHANGE > or NETDEV_DOWN events. If so, it may be that your physical connection, > or your router/switch/hub, is a bit dodgy. > > None of the other paths via which the interface may get disabled seem > very likely to occur, but we can look at those if it doesn''t appear > that you are getting NETDEV events.Hi, I finally managed to get it. Here is my patch to net/bridge/br_notify.c: --- br_notify.c.original 2004-11-14 15:31:58.000000000 +0100 +++ br_notify.c 2004-12-17 13:57:19.000000000 +0100 @@ -45,37 +45,58 @@ switch (event) { case NETDEV_CHANGEMTU: dev_set_mtu(br->dev, br_min_mtu(br)); + printk("NETDEV_CHANGEMTU\n"); break; case NETDEV_CHANGEADDR: br_fdb_changeaddr(p, dev->dev_addr); br_stp_recalculate_bridge_id(br); + printk("NETDEV_CHANGEADDR\n"); break; case NETDEV_CHANGE: /* device is up but carrier changed */ - if (!(br->dev->flags & IFF_UP)) + printk("NETDEV_CHANGE "); + if (!(br->dev->flags & IFF_UP)) { + printk("- !(br->dev->flags & IFF_UP)==true\n"); break; + } if (netif_carrier_ok(dev)) { - if (p->state == BR_STATE_DISABLED) + printk("- netif_carrier_ok(dev)==true "); + if (p->state == BR_STATE_DISABLED) { + printk("- calling br_stp_enable_port(p)"); br_stp_enable_port(p); + } } else { - if (p->state != BR_STATE_DISABLED) + printk("- netif_carrier_ok(dev)!=true "); + if (p->state != BR_STATE_DISABLED) { + printk(" - calling br_stp_disable_port(p)"); br_stp_disable_port(p); + } } + printk("\n"); break; case NETDEV_DOWN: - if (br->dev->flags & IFF_UP) + printk("NETDEV_DOWN "); + if (br->dev->flags & IFF_UP) { + printk("- calling br_stp_disable_port(p)"); br_stp_disable_port(p); + } + printk("\n"); break; case NETDEV_UP: - if (netif_carrier_ok(dev) && (br->dev->flags & IFF_UP)) + printk("NETDEV_UP "); + if (netif_carrier_ok(dev) && (br->dev->flags & IFF_UP)) { + printk("- calling br_stp_enable_port(p)"); br_stp_enable_port(p); + } + printk("\n"); break; case NETDEV_UNREGISTER: + printk("NETDEV_UNREGISTER\n"); spin_unlock_bh(&br->lock); br_del_if(br, dev); goto done; and messages from syslog (obviously my patch is wrong as it''ss putting several messages together (missing "\n")): Dec 17 18:37:02 zirafa NETDEV_CHANGE - netif_carrier_ok(dev)!=true - calling br_stp_disable_port(p)<6>xen-br0: port 1(eth0) entering disabled state Dec 17 18:37:02 zirafa Dec 17 18:38:02 zirafa NETDEV_CHANGE - netif_carrier_ok(dev)==true - calling br_stp_enable_port(p)<6>xen-br0: port 1(eth0) entering learning state Dec 17 18:38:02 zirafa Dec 17 18:38:02 zirafa xen-br0: topology change detected, propagating Dec 17 18:38:02 zirafa xen-br0: port 1(eth0) entering forwarding state However, `ifconfig` says 0 for carrier: eth0 Link encap:Ethernet HWaddr 00:10:4B:B6:BD:0E inet addr:10.18.6.60 Bcast:10.18.6.63 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3382 errors:0 dropped:0 overruns:0 frame:0 TX packets:3164 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:312112 (304.7 Kb) TX bytes:334955 (327.1 Kb) Interrupt:11 Base address:0xe400 xen-br0 Link encap:Ethernet HWaddr 00:10:4B:B6:BD:0E inet addr:10.18.6.60 Bcast:10.18.6.63 Mask:255.255.255.255 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3186 errors:0 dropped:0 overruns:0 frame:0 TX packets:2971 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:246738 (240.9 Kb) TX bytes:307288 (300.0 Kb) Xen box has 3c590 card which is connected by cat5 1m long patch cable into TP-Link TL-SL3210 managable switch (which has 8*100Mbps, 1*1Gbps and 1 GBIC slot) which was quite expensive, so I don''t think it''s such a piece of s*** :-), but I might be mistaken (hope I am not). Possible workaround could be setting up routing between domain0 and other domains, but it won''t be a simple task. TIA, jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Xen box has 3c590 card which is connected by cat5 1m long patch cable > into TP-Link TL-SL3210 managable switch (which has 8*100Mbps, 1*1Gbps > and 1 GBIC slot) which was quite expensive, so I don''t think it''s such a > piece of s*** :-), but I might be mistaken (hope I am not).Your syslog entries make it look like your eth0 carrier was out for one minute, but the thing to do is to observe the times at which you get outages, and see what syslog messages you get at those times -- are you getting NETDEV_CHANGE messages at those points in time? And maybe your switch is good, but 3c590 *is* an ancient p.o.s. ;-) -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Your syslog entries make it look like your eth0 carrier was out for > one minute, but the thing to do is to observe the times at which you > get outages, and see what syslog messages you get at those times -- > are you getting NETDEV_CHANGE messages at those points in time?Originally, I thought that I got outage only if network is idle for some time, but right now, I typed `dmesg` in the ssh connection, it showed the log and then freezed (network, not machine). After a while, it get ok. According to syslog messages, network drops out for exactly one minute. It hapens quite often (`grep forwarding /var/log/messages`): Dec 17 15:56:17 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 18:38:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 19:35:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 19:42:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:01:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:17:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:24:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:36:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:43:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 20:53:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 21:01:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 21:11:02 zirafa xen-br0: port 1(eth0) entering forwarding state Dec 17 21:19:02 zirafa xen-br0: port 1(eth0) entering forwarding state> And maybe your switch is good, but 3c590 *is* an ancient p.o.s. ;-)Well, comparing it to rtl-8139 based cards, I think 3com is better, IMHO ;-). All the eepro cards are in more important machines... So, do you think that the problem is in the NIC? `ifconfig` says "carrier: 0"... I could replace it with some Realtek-based card. Is there anything else I can do? TIA, jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Your syslog entries make it look like your eth0 carrier was out for > one minute,btw, what does it mean? The only mean of diagnostic I know about is that the LED on the switch is lit... jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > And maybe your switch is good, but 3c590 *is* an ancient p.o.s. ;-) > > Well, comparing it to rtl-8139 based cards, I think 3com is better, IMHO > ;-). All the eepro cards are in more important machines...With the RTL8139 you are not setting a high bar. :-)> So, do you think that the problem is in the NIC? `ifconfig` says > "carrier: 0"... > I could replace it with some Realtek-based card. Is there anything else > I can do?It''s unlikely due to a badly-designed NIC. More likely some piece of your hardware is marginal and needs replacing. Try swapping out the NIC, the cable, and the switch, in turn. If you have no alternative switch then try a different switch port at least. ''carrier: 0'' means nothing really. Working systems show that. :-) -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> It''s unlikely due to a badly-designed NIC. More likely some piece of > your hardware is marginal and needs replacing. Try swapping out the > NIC, the cable, and the switch, in turn. If you have no alternative > switch then try a different switch port at least. > > ''carrier: 0'' means nothing really. Working systems show that. :-)OK, it is connected via another cable into another port now :-). If it doesn''t help, I''ll replace the NIC. thanks, jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> OK, it is connected via another cable into another port now :-). If it > doesn''t help, I''ll replace the NIC.still dying -> replacing 3com for rtl-8139 :-( jkt -- cd /local/pub && more beer > /dev/mouth ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel