Hi, I am having some trouble with the send_fake_arp in the netfront driver. Normally, on my domU, which has no queuing disciplines compiled in, the packets are sent via dev_queue_xmit in net/core/dev.c and enqueued using pfifo_fast_enqueue in net/sched/sch_generic.c. However, during live migration, send_fake_arp() returns -2 and does not go to pfifo_fast_enqueue any more. I have been able to trace it further than this code in dev_queue_xmit: if (q->enqueue) { /* Grab device queue */ spin_lock(&dev->queue_lock); rc = q->enqueue(skb, q); qdisc_run(dev); spin_unlock(&dev->queue_lock); rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc; goto out; } I noticed that the error code returned by send_fake_arp() is not checked. Would it be a good option to check the error code and reschedule the arp broadcast at a later time? I have made some changes to xen 3.0.3 regarding block device migration so I might have messed things up. It could be the reason only few people reported this problem on xen-users. Obviously, the problem can also go unnoticed if a downtime of 1-2 seconds is tolerated. Does anyone have any hints on why this might happen or how to search for more clues? Thank you. Cristian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I am having some trouble with the send_fake_arp in the netfrontdriver.>Interesting - I was just composing an almost identical note; we''ve been seeing some horrible network blackouts in migration that are caused by a failure to send the gratuitous ARP (blackouts vary from 0-50+ seconds when the domain is idle and just being pinged from outside). In my case, I NEVER see the gratuitous ARP being sent (confirmed using tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit is sometimes 0 and sometimes 2 (that''s PLUS 2 -- congestion notification [NET_XMIT_CN]). My next step was going to be to add instrumentation to netback but I thought I would ask if this is a known issue with 3.0.3 first... Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2007-03-02 at 17:22 -0500, Graham, Simon wrote:> > I am having some trouble with the send_fake_arp in the netfront > driver. > > > > Interesting - I was just composing an almost identical note; we''ve been > seeing some horrible network blackouts in migration that are caused by a > failure to send the gratuitous ARP (blackouts vary from 0-50+ seconds > when the domain is idle and just being pinged from outside).When I was last doing this with self-migration, I would resend the ARP reply several times after arrival, as in any case the packet may get lost due to a collision etc. This is pretty trivial with self-migration because is controlled by a guest userspace program -- I suppose that doing it in netfront would require a short-lived kernel thread. Or perhaps this should just all be handled by dom0, since we are talking hosted migration anyway. Actually, having netfront send out protocol dependent packets is quite ugly, and makes netfront depend on IP being enabled, which is all wrong (see link error below when disabling IP in xenlinux). If the ARP is only being used to advertise the move of the MAC to a new port, it would be better to construct some kind of reliable protocol, e.g. pinging a remote host (like the default GW) until an answer comes back. This should be enough to make sure the switch got the message. If the ARP is used for updating peer ARP caches, pinging everyone in the guest''s /proc/net/arp table until a majority have replied would be a solution. Here is the link error from 3.0.3 linux guest with IP disabled: .... CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `send_fake_arp'': netfront.c:(.text+0x21ae2): undefined reference to `inet_select_addr'' netfront.c:(.text+0x21b08): undefined reference to `arp_create'' drivers/built-in.o: In function `netif_init'': netfront.c:(.init.text+0x19cd): undefined reference to `register_inetaddr_notifier'' drivers/built-in.o: In function `netif_exit'': netfront.c:(.exit.text+0xb2): undefined reference to `unregister_inetaddr_notifier'' make: *** [.tmp_vmlinux1] Error 1 Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> When I was last doing this with self-migration, I would resend the ARP > reply several times after arrival, as in any case the packet may get > lost due to a collision etc. This is pretty trivial withself-migration> because is controlled by a guest userspace program -- I suppose that > doing it in netfront would require a short-lived kernel thread. Or > perhaps this should just all be handled by dom0, since we are talking > hosted migration anyway. Actually, having netfront send out protocol > dependent packets is quite ugly, and makes netfront depend on IP being > enabled, which is all wrong (see link error below when disabling IP in > xenlinux).I understand that the gratuitous ARP could be lost (and probably should be sent multiple times) _but_ I am currently seeing a 100% loss (the packet simply never makes it to peth0 and out on the wire), so I think there is an actual bug somewhere in the netfront<->netback<->bridge path... /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> In my case, I NEVER see the gratuitous ARP being sent (confirmed using> tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit is > sometimes 0 and sometimes 2 (that''s PLUS 2 -- congestion notification > [NET_XMIT_CN]). I am seeing the same error, indeed it looks like it is NET_XMIT_CN. I also see 100% percent loss, the ARP never makes it to the wire in any of my tests.> If the ARP is only being used to advertise the move of the MAC to a new > port, it would be better to construct some kind of reliable protocol, > e.g. pinging a remote host (like the default GW) until an answer comes > back. This should be enough to make sure the switch got the message. If > the ARP is used for updating peer ARP caches, pinging everyone in the > guest''s /proc/net/arp table until a majority have replied would be a > solution.Given that I am seeing 100% loos, I am considering implementing something like this. However, I assumed that sending the arp broadcast in netfront works for most people using 3.0.3. Can anyone confirm that they are actually seeing the arp being sent on the wire with xen >= 3.0.3? Cristian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > In my case, I NEVER see the gratuitous ARP being sent (confirmed > using > > tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit > is > > sometimes 0 and sometimes 2 (that''s PLUS 2 -- congestion > notification > > [NET_XMIT_CN]). > > I am seeing the same error, indeed it looks like it is NET_XMIT_CN. I > also see 100% percent loss, the ARP never makes it to the wire in any > of > my tests. >So, I have a little more info now -- it seems that the ARP is being assembled and passed to the backend driver BUT it is ignoring it because the VIF link state is down (netif_carrier_ok() is returning FALSE) -- the link goes up shortly after, but the packet has been dropped by this time. The actual sequence of events is also a little strange (but *very* reproducible): . In the DomU, I see the following at the end of migration: . First, netfront sees the backend state change to InitWait - this causes it to attempt to connect the rings and send the ARP (even though the current state is actually Connected). . Next, the resume processing runs in netfront (I think this is expected to run first but it does not). . Now it sees the back state change to InitWait a second time and attempts to send the ARP a second time. . In Dom0: . The first attempt to send the ARP is completely ignored since the backend is not connected yet (specifically, it hasn''t set up the softirq handler) . The first thing we see is the front end state changing to Connected -- this causes it to initialize the connection and setup the irq handler . Now we see an irq signaled, BUT it is ignored by the backend because netif_carrier_ok() returns FALSE. . The very next thing is the link becomes ready and the backend completes its state change to the Connected state. It seems to me that problem lies in the fact that the backend sees the ARP packet before it''s finished setting up the vif and ignores it. I don''t know if this is relevant, but Dom0 is running with 2 VCPUs in this configuration so it''s possible that the timing window here was not seen when Dom0 is run as a uni-processor... /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > > In my case, I NEVER see the gratuitous ARP being sent (confirmed > > using > > > tcpdump on peth0 in Dom0) and the return value fromdev_queue_xmit> > is > > > sometimes 0 and sometimes 2 (that''s PLUS 2 -- congestion > > notification > > > [NET_XMIT_CN]). > > > > I am seeing the same error, indeed it looks like it is NET_XMIT_CN.I> > also see 100% percent loss, the ARP never makes it to the wire inany> > of > > my tests. > > >I guess no one else is seeing this problem? Anyway -- after a fair amount of stumbling around I think I know what the problem is (but I don''t have a solution) -- for a while, I thought it was an SMP bug in the netfront/netback interaction but, although there is some dodgy code there, it does seem that it always sends the gratuitous ARP and the backend always picks it up. The real problem seems to be in the bridge in Dom0; it seems that the VIF port to the bridge is always in the disabled state when the ARP is sent, so it simply gets dropped. Why is this? Well, the bridge doesn''t enable the port until the VIF is both up AND has link (netif_carrier_on() has been called) -- this latter call is not made until netfront connects to netback. What''s more, this change is not passed to the bridge code until the next time the netwatch worker runs, which could be up to 1s after the netif_carrier_on() is called... at least, that''s how it looks to me... All of this leads to a ~1s delay setting up the network path plus the gratuitous ARP is dropped so there can be a MUCH larger network blackout. If you are trying to get sub-second blackout on migration this is a big problem! It seems to me that the right thing to do here is to have the link up on the VIF in advance of the domain resuming on the target but I''m guessing that this would cause netback to have conniptions... All suggestions welcome... Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 6/3/07 22:59, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> It seems to me that the right thing to do here is to have the link up on > the VIF in advance of the domain resuming on the target but I''m guessing > that this would cause netback to have conniptions...We''ll have to think about this a bit. The carrier flag is currently being used as a handy software flag inside netback to indicate whether the device channel is fully set up or not. I think what you propose will essentially be a reversion of one of Herbert Xu''s patches from some time ago (where he removed an explicit software flag and started using the carrier flag instead). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 6/3/07 22:59, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> All of this leads to a ~1s delay setting up the network path plus the > gratuitous ARP is dropped so there can be a MUCH larger network > blackout. If you are trying to get sub-second blackout on migration this > is a big problem! > > It seems to me that the right thing to do here is to have the link up on > the VIF in advance of the domain resuming on the target but I''m guessing > that this would cause netback to have conniptions...Changeset 14280 in xen-unstable fixes this by no longer using the netif_carrier flag. The patch will probably backport very straightforwardly onto older (or newer) 2.6 Linux kernels. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > All of this leads to a ~1s delay setting up the network path plusthe> > gratuitous ARP is dropped so there can be a MUCH larger network > > blackout. If you are trying to get sub-second blackout on migration > this > > is a big problem! > > > > It seems to me that the right thing to do here is to have the linkup> on > > the VIF in advance of the domain resuming on the target but I''m > guessing > > that this would cause netback to have conniptions... > > Changeset 14280 in xen-unstable fixes this by no longer using the > netif_carrier flag. The patch will probably backport very > straightforwardly > onto older (or newer) 2.6 Linux kernels. >Just wanted to close the loop here -- We''ve tested with this fix and it definitely fixes our problem -- network blackout times are way low again and I see the gratuitous ARP being sent -- many thanks to Keir for the swift fix. Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7/3/07 22:25, "Graham, Simon" <Simon.Graham@stratus.com> wrote:>> Changeset 14280 in xen-unstable fixes this by no longer using the >> netif_carrier flag. The patch will probably backport very >> straightforwardly >> onto older (or newer) 2.6 Linux kernels. > > Just wanted to close the loop here -- We''ve tested with this fix and it > definitely fixes our problem -- network blackout times are way low again > and I see the gratuitous ARP being sent -- many thanks to Keir for the > swift fix.Thanks for narrowing down the problem! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon wrote:>> I am having some trouble with the send_fake_arp in the netfront >> > driver. > > > Interesting - I was just composing an almost identical note; we''ve been > seeing some horrible network blackouts in migration that are caused by a > failure to send the gratuitous ARP (blackouts vary from 0-50+ seconds > when the domain is idle and just being pinged from outside). >I am having the same troubles with xen 3.1 from ubuntu gutsy. I have xen-hypervisor-3.1.0-0ubuntu18 instaled on my system. maybe this bug has eben re-introduced? Can somebody confirm, that this is still working? I am currently doing a semester thesis in measuring downtime, while migrating over loaded/lagged links. The goal would be trying to measure/estimate the migrationtime/downtime. I see no ping reply packets, even if the domain is already up and running. It sakes about 10 seconsd, until I see any ping reply packets. Hans-Christian> In my case, I NEVER see the gratuitous ARP being sent (confirmed using > tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit is > sometimes 0 and sometimes 2 (that''s PLUS 2 -- congestion notification > [NET_XMIT_CN]). > > My next step was going to be to add instrumentation to netback but I > thought I would ask if this is a known issue with 3.0.3 first... > > Simon > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel