Hi there, I''ve read the paper on Xen live migration, and it shows some very impressive figures, like 165ms downtime on a running web server, and 50ms for a quake3 server. I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), 2x Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5. One NIC is connected to the LAN (on the same switch and VLAN), the other interconnects the 2 servers with a cross cable. Then I start pinging the DomU that is going to be migrated with 100ms interval, from within the Dom0 that is currently hosting it. And migrate the VM. The pinging is done on the LAN interface, while the migration occurs on the cross cabled one. 64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms 64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms 64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms 64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time=0.263 ms 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms As you can see, the response time before the migration is around 40us, and after, it''s 200us, which is understandable since the VM is now in another physical host. The problem is the 25 lost packets between the last phase of the migration. Don''t get me wrong: 2.5s is a very good time, but 50 times higher than what it is told to be, isn''t. I tried the same test connecting both machines on a hub, and got the same results. Did anybody try to measure the downtime during a live migration? How are the results? Any thoughts and suggestions are very appreciated. Thanks, Marconi. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
mail4dla@googlemail.com
2007-Aug-10 12:15 UTC
Re: [Xen-users] Live migration: 2500ms downtime
Hi, from my own experience, I can confirm that the actual downtime is very low and the limiting factor is propagation of the new location in the network. As also the Dom0 itself caches MAC adresses, you should try to do the ping from a 3rd machine to rule out that the Dom0 does not send the packets out to the network. If this is not an option, you can use something like ''tcpdump -i eth0 "proto ICMP"'' to see what''s actually going on on your network and correlate this to the output of your ping command. Cheers dla On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:> > Hi there, > > I''ve read the paper on Xen live migration, and it shows some very > impressive figures, like 165ms downtime on a running web server, and 50ms > for a quake3 server. > > I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), 2x > Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5. > > One NIC is connected to the LAN (on the same switch and VLAN), the other > interconnects the 2 servers with a cross cable. > > Then I start pinging the DomU that is going to be migrated with 100ms > interval, from within the Dom0 that is currently hosting it. And migrate the > VM. The pinging is done on the LAN interface, while the migration occurs on > the cross cabled one. > > 64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms > 64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms > 64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms > 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms > 64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time= 0.263 ms > 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms > > As you can see, the response time before the migration is around 40us, and > after, it''s 200us, which is understandable since the VM is now in another > physical host. > > The problem is the 25 lost packets between the last phase of the > migration. Don''t get me wrong: 2.5s is a very good time, but 50 times > higher than what it is told to be, isn''t. > > I tried the same test connecting both machines on a hub, and got the same > results. > > Did anybody try to measure the downtime during a live migration? How are > the results? > > Any thoughts and suggestions are very appreciated. > > Thanks, > Marconi. > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thanks for your reply. I did ping from a third physical machine. The result doesn''t vary much. I followed your advice on analyzing the traffic. But I don''t see why to look for ICMPs, since the DomU does answer the ping, it just has a 2.5s gap after stopping on one machine and starting on the other. BUT, I went looking for the "unsolicited ARP response", and I didn''t get any, which leads to not being able to communicate with the DomU after migration from a third machine. Unless I have an active ssh RECEIVING data from the DomU. Here follows 2 scenarios: 1. I have an ssh connection to the DomU running top. So the DomU is constantly sending packets out the network. After I migrate domU, there''s the 2.5 seconds gap, but after that the top keeps going, and I can use the ssh connection as usual. 2. I have an ssh connection to the DomU, but I''m not running any foreground process. (or have no connection at all) After I migrate domU, the ssh connection doesn''t respond. The domU doesn''t respond to ping or anything. That happens when the physical machines are connected to the switch. I started tcpdump on both Dom0''s to see if the DomU would send the unsolicited arp reply to update the switch''s tables. And there is none. So, unless there is already traffic going out from the domU, there isn''t anything to tell the switch the machine changed from one port to another. Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes with it), and applied the Xen related official CentOS (same as redhat''s) updates. Again, any thoughts or suggestions would be really appreciated. Thanks, Marconi. On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com> wrote:> > Hi, > > from my own experience, I can confirm that the actual downtime is very low > and the limiting factor is propagation of the new location in the network. > > As also the Dom0 itself caches MAC adresses, you should try to do the ping > from a 3rd machine to rule out that the Dom0 does not send the packets out > to the network. > If this is not an option, you can use something like ''tcpdump -i eth0 > "proto ICMP"'' to see what''s actually going on on your network and correlate > this to the output of your ping command. > > Cheers > dla > > > On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote: > > > Hi there, > > > > I''ve read the paper on Xen live migration, and it shows some very > > impressive figures, like 165ms downtime on a running web server, and 50ms > > for a quake3 server. > > > > I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), > > 2x Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5. > > > > One NIC is connected to the LAN (on the same switch and VLAN), the other > > interconnects the 2 servers with a cross cable. > > > > Then I start pinging the DomU that is going to be migrated with 100ms > > interval, from within the Dom0 that is currently hosting it. And migrate the > > VM. The pinging is done on the LAN interface, while the migration occurs on > > the cross cabled one. > > > > 64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms > > 64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms > > 64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms > > 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms > > 64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time= 0.263 ms > > 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms > > > > As you can see, the response time before the migration is around 40us, > > and after, it''s 200us, which is understandable since the VM is now in > > another physical host. > > > > The problem is the 25 lost packets between the last phase of the > > migration. Don''t get me wrong: 2.5s is a very good time, but 50 times > > higher than what it is told to be, isn''t. > > > > I tried the same test connecting both machines on a hub, and got the > > same results. > > > > Did anybody try to measure the downtime during a live migration? How are > > the results? > > > > Any thoughts and suggestions are very appreciated. > > > > Thanks, > > Marconi. > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
mail4dla@googlemail.com
2007-Aug-10 15:08 UTC
Re: [Xen-users] Live migration: 2500ms downtime
Hi, On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:> > I did ping from a third physical machine. The result doesn''t vary much. > > I followed your advice on analyzing the traffic. But I don''t see why to > look for ICMPs, since the DomU does answer the ping, it just has a 2.5sgap after stopping on one machine and starting on the other.Well, I asked you to this (in your original test setup where the ping was performed from the source Dom0) in order to see whether the packets are actually sent out of the machine, or the Dom0 tries to send it through the bridge and the no-longer existing virtual interface. Here follows 2 scenarios: That happens when the physical machines are connected to the switch. I> started tcpdump on both Dom0''s to see if the DomU would send the unsolicited > arp reply to update the switch''s tables. And there is none. So, unless there > is already traffic going out from the domU, there isn''t anything to tell the > switch the machine changed from one port to another.This is excactly the anticipated behaviour. AFAIR, it is a known issue and more recent builds of Xen do send the unsolicited arp reply after migration. With a switch, you are quite lucky to have only 2.5s outtime. Depending on the switch and it''s ageing algorithm, this can be significantly higher, i.e., 30s or so. Do you also have the 2.5s outtime when pinging from a 3rd machine and having the machines connected by a hub (actually, it''s sufficient to connect the Xen machines via a hub to the same switch port)? Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes with> it), and applied the Xen related official CentOS (same as redhat''s) updates.My experiences are based on the Xen that is shipped with Ubuntu 7.04, which is also a 3.0.3 and when the machines are connected through a hub, the unavailability period is within the same orders as in the paper that you quoted in your first email. hth dla _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi again, On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com> wrote:> > Hi, > > On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote: > > > > I did ping from a third physical machine. The result doesn''t vary much. > > > > I followed your advice on analyzing the traffic. But I don''t see why to > > look for ICMPs, since the DomU does answer the ping, it just has a 2.5sgap after stopping on one machine and starting on the other. > > > Well, I asked you to this (in your original test setup where the ping was > performed from the source Dom0) in order to see whether the packets are > actually sent out of the machine, or the Dom0 tries to send it through the > bridge and the no-longer existing virtual interface. >Oh, I get it. Sorry for not making it clear. You know, I gotta give out enough info to let people be able to help, and not too much to make it a too long email and drive people away. :) Here follows 2 scenarios:> > > > > That happens when the physical machines are connected to the switch. I > > started tcpdump on both Dom0''s to see if the DomU would send the unsolicited > > arp reply to update the switch''s tables. And there is none. So, unless there > > is already traffic going out from the domU, there isn''t anything to tell the > > switch the machine changed from one port to another. > > > This is excactly the anticipated behaviour. AFAIR, it is a known issue and > more recent builds of Xen do send the unsolicited arp reply after migration. > With a switch, you are quite lucky to have only 2.5s outtime. Depending on > the switch and it''s ageing algorithm, this can be significantly higher, > i.e., 30s or so. > > Do you also have the 2.5s outtime when pinging from a 3rd machine and > having the machines connected by a hub (actually, it''s sufficient to connect > the Xen machines via a hub to the same switch port)? >One test I tried was exactly connecting the Xen machines to a hub (a 10mbps though - the only available) and the hub to the switch. Because of that, I set the cross connection on the second port, so the migration could be done quickly. The NICs and switch are 10/100/1000. The average 2 to 3 seconds downtime still occur with the hub. Another issue that I described on a previous email (which, unfortunately, didn''t get any replies) is that this downtime increases to more than 20 seconds if I set the domU''s memory to 512MB (the maxmem set is 1024MB). I repeated the test successively, from one side to the other, with mem set to 512 and 1024, and the result was always the same. Around 3s with mem maxmem, and around 24s with mem=512 and maxmem=1024. Just to make it clear: the domU is running only an apache server. The cpu, mem, and net loads are really low and shouldn''t be interfering. Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes with> > it), and applied the Xen related official CentOS (same as redhat''s) updates. > > > > > My experiences are based on the Xen that is shipped with Ubuntu 7.04, > which is also a 3.0.3 and when the machines are connected through a hub, > the unavailability period is within the same orders as in the paper that you > quoted in your first email. > > > hth > dla > >I''m starting to consider that it might be a problem with this distribution specifically, although I really don''t see why it should be. Thanks for the help. Still, any other suggestions or insights are most welcome. Marconi. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote:> Another issue that I described on a previous email (which, unfortunately, > didn''t get any replies) is that this downtime increases to more than 20 > seconds if I set the domU''s memory to 512MB (the maxmem set is 1024MB). I > repeated the test successively, from one side to the other, with mem set to > 512 and 1024, and the result was always the same. Around 3s with mem > maxmem, and around 24s with mem=512 and maxmem=1024. >You are using the option --live to migrate, aren''t you? -- lfr 0/0 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
>>> "Marconi Rivello" <marconirivello@gmail.com> 08/10/07 9:42 AM >>>Hi again, On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com> wrote:> > Hi, >It would be interesting, if you were willing, to try something like sles10sp1. It runs 3.0.4 with lots of back ports from 3.1. I realize that this is a lot of work to redo everything. (maybe you can still use your centos images) Stephen _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, Luciano. On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:> > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote: > > Another issue that I described on a previous email (which, > unfortunately, > > didn''t get any replies) is that this downtime increases to more than 20 > > seconds if I set the domU''s memory to 512MB (the maxmem set is 1024MB). > I > > repeated the test successively, from one side to the other, with mem set > to > > 512 and 1024, and the result was always the same. Around 3s with mem > > maxmem, and around 24s with mem=512 and maxmem=1024. > > > > You are using the option --live to migrate, aren''t you?Yes, I am. :) Even if I weren''t, it would make sense to expect a lower downtime (or the same downtime) by reducing the domU memory. But it takes longer if I reduce the domU''s memory. Would you happen to have any ideas on why it behaves like that? --> lfr > 0/0 > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > >Thanks, Marconi. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi there Stephen, On 8/10/07, Stephen Shaw <stshaw@novell.com> wrote:> > >>> "Marconi Rivello" <marconirivello@gmail.com> 08/10/07 9:42 AM >>> > Hi again, > > On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com> wrote: > > > > Hi, > > > > > It would be interesting, if you were willing, to try something like > sles10sp1. It runs 3.0.4 with lots of back ports from 3.1. I realize > that this is a lot of work to redo everything. (maybe you can still use your > centos images) > > Stephen >I downloaded opensuse 10.2 for x86_64. But when I mounted the iso I saw it comes with 3.0.3. The servers I will be using are isolated from the internet, so I won''t be able to use yast to update it from the internet. Would you happen to know where can I download suse''s official Xen update packages, so I can download from my desktop and then transfer it to the servers? I looked around for a while, and couldn''t find it. I hope I''m wrong, but I''m starting to think it might be available only to the commercial distribution. Thanks. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Stephen > > I downloaded opensuse 10.2 for x86_64. But when I mounted the iso IDon''t use opensuse 10.2. I would say 10.3, but its not ready. I''d recommend running the test with SLES 10 SP 1. To bad 10.3 wasn''t finished. Stephen _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 10 Aug 2007, Marconi Rivello wrote:> Yes, I am. :) > Even if I weren''t, it would make sense to expect a lower downtime (or the > same downtime) by reducing the domU memory. But it takes longer if I reduce > the domU''s memory. > > Would you happen to have any ideas on why it behaves like that?That might indicate a memory crunch... e.g. memory is actively being cycled for disk caching... and since it is being changed, XEN is having to cycle more often whilst trying to sync the memory maps, I think the xen logs on dom0 may give you some info. But I''m not sure if that logic applies to the actual downtime... it certainly will apply to the delay between initiating the migrate command and the completion. If the network is still a possible issue, try running the ping from the domU being moved and watching _that_ packet stream from the outside via tcpdump or something. That might help cut through the bridges/switches ... and if it''s a bridge/switch issue then you''re wasting time trying to fix it in XEN (and that would be good to know). -Tom _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote:> Hi, Luciano. > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote: > > > Another issue that I described on a previous email (which, > > unfortunately, > > > didn''t get any replies) is that this downtime increases to more than 20 > > > seconds if I set the domU''s memory to 512MB (the maxmem set is 1024MB). > > I > > > repeated the test successively, from one side to the other, with mem set > > to > > > 512 and 1024, and the result was always the same. Around 3s with mem > > > maxmem, and around 24s with mem=512 and maxmem=1024. > > > > > > > You are using the option --live to migrate, aren''t you? > > > Yes, I am. :)Oh. Well, then, could you try without? :) Also, try the reverse. Ping an outside host in the domU.> Even if I weren''t, it would make sense to expect a lower downtime (or the > same downtime) by reducing the domU memory. But it takes longer if I reduce > the domU''s memory.That is odd. Is the Dom0 memory the same (ie., fixed)?> Would you happen to have any ideas on why it behaves like that?No idea. I might expect a longer migration time for a machine with a very active working set, but not a much longer downtime. That should be only a freeze, final sync, and resume on the other side. -- lfr 0/0 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:> > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote: > > Hi, Luciano. > > > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > > > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote: > > > > Another issue that I described on a previous email (which, > > > unfortunately, > > > > didn''t get any replies) is that this downtime increases to more than > 20 > > > > seconds if I set the domU''s memory to 512MB (the maxmem set is > 1024MB). > > > I > > > > repeated the test successively, from one side to the other, with mem > set > > > to > > > > 512 and 1024, and the result was always the same. Around 3s with mem > > > > > maxmem, and around 24s with mem=512 and maxmem=1024. > > > > > > > > > > You are using the option --live to migrate, aren''t you? > > > > > > Yes, I am. :) > > Oh. Well, then, could you try without? :)I could, but what I''m whining :) about is to have a period of unresponsiveness of a couple of seconds, instead of a tenth of a second. If I do a stop-copy-restart migration it will be even longer. Also, try the reverse. Ping an outside host in the domU. I will. In fact, I will try all the monitoring suggestions (from you and the others). Inside domU, outside, third machine, ICMP, ARP...> Even if I weren''t, it would make sense to expect a lower downtime (or the > > same downtime) by reducing the domU memory. But it takes longer if I > reduce > > the domU''s memory. > > That is odd. Is the Dom0 memory the same (ie., fixed)? > > > Would you happen to have any ideas on why it behaves like that? > > No idea. I might expect a longer migration time for a machine with a > very active working set, but not a much longer downtime. That should be > only a freeze, final sync, and resume on the other side. > > -- > lfr > 0/0 > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > >I would like to thank everyone who contributed with ideas. It was very helpful. Unfortunately, I will be gone for the next week on a training, and will only be able to further investigate when I get back to work. When I do, I will do some more tests and post what I find out or not. Thanks again, Marconi. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I have also noticed the problem that domains take much longer to migrate if they are set to have memory < maxmem. I know this was a problem back in the Xen 3.0.1 days and had thought I had heard that it was getting fixed with newer versions. At that time if you looked at one of the xen debug logs there were millions of lines of errors every time you attempted to migrate a domain which had its memory image shrunk... I would suggest that you check if your current setup is also producing this kind of output and search for that error message -- I know there were some messages on this mailing list about it in the past, but can''t find them right now. good luck! On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:> > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote: > > > Hi, Luciano. > > > > > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > > > > > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote: > > > > > Another issue that I described on a previous email (which, > > > > unfortunately, > > > > > didn''t get any replies) is that this downtime increases to more than > 20 > > > > > seconds if I set the domU''s memory to 512MB (the maxmem set is > 1024MB). > > > > I > > > > > repeated the test successively, from one side to the other, with mem > set > > > > to > > > > > 512 and 1024, and the result was always the same. Around 3s with mem > > > > > > maxmem, and around 24s with mem=512 and maxmem=1024. > > > > > > > > > > > > > You are using the option --live to migrate, aren''t you? > > > > > > > > > Yes, I am. :) > > > > Oh. Well, then, could you try without? :) > > I could, but what I''m whining :) about is to have a period of > unresponsiveness of a couple of seconds, instead of a tenth of a second. If > I do a stop-copy-restart migration it will be even longer. > > > Also, try the reverse. Ping an outside host in the domU. > > I will. In fact, I will try all the monitoring suggestions (from you and > the others). Inside domU, outside, third machine, ICMP, ARP... > > > > Even if I weren''t, it would make sense to expect a lower downtime (or > the > > > same downtime) by reducing the domU memory. But it takes longer if I > reduce > > > the domU''s memory. > > > > That is odd. Is the Dom0 memory the same (ie., fixed)? > > > > > Would you happen to have any ideas on why it behaves like that? > > > > No idea. I might expect a longer migration time for a machine with a > > very active working set, but not a much longer downtime. That should be > > only a freeze, final sync, and resume on the other side. > > > > -- > > lfr > > 0/0 > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > > > > > > I would like to thank everyone who contributed with ideas. It was very > helpful. Unfortunately, I will be gone for the next week on a training, and > will only be able to further investigate when I get back to work. When I do, > I will do some more tests and post what I find out or not. > > Thanks again, > Marconi. > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi there, Following suggestions, I installed SLES 10 SP1, with Xen 3.0.4. Although the migration downtime diminished, it is still an order of magnitude higher than what it should be. It is now around 1.2s. That is still pretty impressive, if it weren''t for the fact that there''s still no ARP after the migration, so the switch doesn''t update it''s tables with which port the VM is on, isolating it from the outside world. I measured the downtime pinging from within the VM to an outside host, in 100ms intervals. With the constant pinging, the VM advertises itself to the switch, and the communication continues after 12 packets lost... But if there is no activity from the VM, I can only get it back to life pinging it from the new dom0. That generates an ICMP reply, that goes out the physical ethernet and advertises it''s new location. I was looking at the external-migration-tool option in xend-config.sxp, but can''t figure out how to use it or if it would even be useful to automatically ping the VM after the migration. I can''t find any documentation about it or examples. Any ideas? Thanks. On 8/15/07, Tim Wood <twwood@gmail.com> wrote:> > I have also noticed the problem that domains take much longer to > migrate if they are set to have memory < maxmem. I know this was a > problem back in the Xen 3.0.1 days and had thought I had heard that it > was getting fixed with newer versions. > > At that time if you looked at one of the xen debug logs there were > millions of lines of errors every time you attempted to migrate a > domain which had its memory image shrunk... I would suggest that you > check if your current setup is also producing this kind of output and > search for that error message -- I know there were some messages on > this mailing list about it in the past, but can''t find them right now. > > good luck! > > On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote: > > > > > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote: > > > > Hi, Luciano. > > > > > > > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote: > > > > > > > > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote: > > > > > > Another issue that I described on a previous email (which, > > > > > unfortunately, > > > > > > didn''t get any replies) is that this downtime increases to more > than > > 20 > > > > > > seconds if I set the domU''s memory to 512MB (the maxmem set is > > 1024MB). > > > > > I > > > > > > repeated the test successively, from one side to the other, with > mem > > set > > > > > to > > > > > > 512 and 1024, and the result was always the same. Around 3s with > mem > > > > > > > > maxmem, and around 24s with mem=512 and maxmem=1024. > > > > > > > > > > > > > > > > You are using the option --live to migrate, aren''t you? > > > > > > > > > > > > Yes, I am. :) > > > > > > Oh. Well, then, could you try without? :) > > > > I could, but what I''m whining :) about is to have a period of > > unresponsiveness of a couple of seconds, instead of a tenth of a second. > If > > I do a stop-copy-restart migration it will be even longer. > > > > > Also, try the reverse. Ping an outside host in the domU. > > > > I will. In fact, I will try all the monitoring suggestions (from you > and > > the others). Inside domU, outside, third machine, ICMP, ARP... > > > > > > Even if I weren''t, it would make sense to expect a lower downtime > (or > > the > > > > same downtime) by reducing the domU memory. But it takes longer if I > > reduce > > > > the domU''s memory. > > > > > > That is odd. Is the Dom0 memory the same (ie., fixed)? > > > > > > > Would you happen to have any ideas on why it behaves like that? > > > > > > No idea. I might expect a longer migration time for a machine with a > > > very active working set, but not a much longer downtime. That should > be > > > only a freeze, final sync, and resume on the other side. > > > > > > -- > > > lfr > > > 0/0 > > > > > > _______________________________________________ > > > Xen-users mailing list > > > Xen-users@lists.xensource.com > > > http://lists.xensource.com/xen-users > > > > > > > > > > > > I would like to thank everyone who contributed with ideas. It was very > > helpful. Unfortunately, I will be gone for the next week on a training, > and > > will only be able to further investigate when I get back to work. When I > do, > > I will do some more tests and post what I find out or not. > > > > Thanks again, > > Marconi. > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users