thr3ads.net - Xen users - [Xen-users] Live migration: 2500ms downtime [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Marconi Rivello

2007-Aug-10 11:15 UTC

[Xen-users] Live migration: 2500ms downtime

Hi there,

I''ve read the paper on Xen live migration, and it shows some very
impressive
figures, like 165ms downtime on a running web server, and 50ms for a quake3
server.

I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), 2x
Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5.

One NIC is connected to the LAN (on the same switch and VLAN), the other
interconnects the 2 servers with a cross cable.

Then I start pinging the DomU that is going to be migrated with 100ms
interval, from within the Dom0 that is currently hosting it. And migrate the
VM. The pinging is done on the LAN interface, while the migration occurs on
the cross cabled one.

64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms
64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms
64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms
64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms
64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time=0.263 ms
64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms

As you can see, the response time before the migration is around 40us, and
after, it''s 200us, which is understandable since the VM is now in
another
physical host.

The problem is the 25 lost packets between the last phase of the migration.
Don''t get me wrong: 2.5s is a very good time, but 50 times higher than
what
it is told to be, isn''t.

I tried the same test connecting both machines on a hub, and got the same
results.

Did anybody try to measure the downtime during a live migration? How are the
results?

Any thoughts and suggestions are very appreciated.

Thanks,
Marconi.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

mail4dla@googlemail.com

2007-Aug-10 12:15 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi,

from my own experience, I can confirm that the actual downtime is very low
and the limiting factor is propagation of the new location in the network.

As also the Dom0 itself caches MAC adresses, you should try to do the ping
from a 3rd machine to rule out that the Dom0 does not send the packets out
to the network.
If this is not an option, you can use something like ''tcpdump -i eth0
"proto
ICMP"'' to see what''s actually going on on your network
and correlate this to
the output of your ping command.

Cheers
dla


On 8/10/07, Marconi Rivello <marconirivello@gmail.com>
wrote:>
> Hi there,
>
> I''ve read the paper on Xen live migration, and it shows some very
> impressive figures, like 165ms downtime on a running web server, and 50ms
> for a quake3 server.
>
> I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), 2x
> Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5.
>
> One NIC is connected to the LAN (on the same switch and VLAN), the other
> interconnects the 2 servers with a cross cable.
>
> Then I start pinging the DomU that is going to be migrated with 100ms
> interval, from within the Dom0 that is currently hosting it. And migrate
the
> VM. The pinging is done on the LAN interface, while the migration occurs on
> the cross cabled one.
>
> 64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms
> 64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms
> 64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms
> 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms
> 64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time= 0.263 ms
> 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms
>
> As you can see, the response time before the migration is around 40us, and
> after, it''s 200us, which is understandable since the VM is now in
another
> physical host.
>
> The problem is the 25 lost packets between the last phase of the
> migration. Don''t get me wrong: 2.5s is a very good time, but 50
times
> higher than what it is told to be, isn''t.
>
> I tried the same test connecting both machines on a hub, and got the same
> results.
>
> Did anybody try to measure the downtime during a live migration? How are
> the results?
>
> Any thoughts and suggestions are very appreciated.
>
> Thanks,
> Marconi.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-10 14:21 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Thanks for your reply.

I did ping from a third physical machine. The result doesn''t vary much.

I followed your advice on analyzing the traffic. But I don''t see why to
look
for ICMPs, since the DomU does answer the ping, it just has a 2.5s gap after
stopping on one machine and starting on the other.

BUT, I went looking for the "unsolicited ARP response", and I
didn''t get
any, which leads to not being able to communicate with the DomU after
migration from a third machine. Unless I have an active ssh RECEIVING data
from the DomU.

Here follows 2 scenarios:

1. I have an ssh connection to the DomU running top. So the DomU is
constantly sending packets out the network.
After I migrate domU, there''s the 2.5 seconds gap, but after that the
top
keeps going, and I can use the ssh connection as usual.

2. I have an ssh connection to the DomU, but I''m not running any
foreground
process. (or have no connection at all)
After I migrate domU, the ssh connection doesn''t respond. The domU
doesn''t
respond to ping or anything.

That happens when the physical machines are connected to the switch. I
started tcpdump on both Dom0''s to see if the DomU would send the
unsolicited
arp reply to update the switch''s tables. And there is none. So, unless
there
is already traffic going out from the domU, there isn''t anything to
tell the
switch the machine changed from one port to another.

Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes
with
it), and applied the Xen related official CentOS (same as redhat''s)
updates.

Again, any thoughts or suggestions would be really appreciated.

Thanks,
Marconi.

On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com>
wrote:>
> Hi,
>
> from my own experience, I can confirm that the actual downtime is very low
> and the limiting factor is propagation of the new location in the network.
>
> As also the Dom0 itself caches MAC adresses, you should try to do the ping
> from a 3rd machine to rule out that the Dom0 does not send the packets out
> to the network.
> If this is not an option, you can use something like ''tcpdump -i
eth0
> "proto ICMP"'' to see what''s actually going on
on your network and correlate
> this to the output of your ping command.
>
> Cheers
> dla
>
>
> On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:
>
> > Hi there,
> >
> > I''ve read the paper on Xen live migration, and it shows some
very
> > impressive figures, like 165ms downtime on a running web server, and
50ms
> > for a quake3 server.
> >
> > I installed CentOS 5 on 2 servers, each with 2x Xeon E5335
(quad-core),
> > 2x Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with
CentOS 5.
> >
> > One NIC is connected to the LAN (on the same switch and VLAN), the
other
> > interconnects the 2 servers with a cross cable.
> >
> > Then I start pinging the DomU that is going to be migrated with 100ms
> > interval, from within the Dom0 that is currently hosting it. And
migrate the
> > VM. The pinging is done on the LAN interface, while the migration
occurs on
> > the cross cabled one.
> >
> > 64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms
> > 64 bytes from 10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms
> > 64 bytes from 10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms
> > 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms
> > 64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time= 0.263 ms
> > 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms
> >
> > As you can see, the response time before the migration is around 40us,
> > and after, it''s 200us, which is understandable since the VM
is now in
> > another physical host.
> >
> > The problem is the 25 lost packets between the last phase of the
> > migration. Don''t get me wrong: 2.5s is a very good time, but
50 times
> > higher than what it is told to be, isn''t.
> >
> > I tried the same test connecting both machines on a hub, and got the
> > same results.
> >
> > Did anybody try to measure the downtime during a live migration? How
are
> > the results?
> >
> > Any thoughts and suggestions are very appreciated.
> >
> > Thanks,
> > Marconi.
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xensource.com
> > http://lists.xensource.com/xen-users
> >
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

mail4dla@googlemail.com

2007-Aug-10 15:08 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi,

On 8/10/07, Marconi Rivello <marconirivello@gmail.com>
wrote:>
> I did ping from a third physical machine. The result doesn''t vary
much.
>
> I followed your advice on analyzing the traffic. But I don''t see
why to
> look for ICMPs, since the DomU does answer the ping, it just has a 2.5sgap
after stopping on one machine and starting on the other.

Well, I asked you to this (in your original test setup where the ping was
performed from the source Dom0) in order to see whether the packets are
actually sent out of the machine, or the Dom0 tries to send it through the
bridge and the no-longer existing virtual interface.

Here follows 2 scenarios:

That happens when the physical machines are connected to the switch.
I> started tcpdump on both Dom0''s to see if the DomU would send the
unsolicited
> arp reply to update the switch''s tables. And there is none. So,
unless there
> is already traffic going out from the domU, there isn''t anything
to tell the
> switch the machine changed from one port to another.

This is excactly the anticipated behaviour. AFAIR, it is a known issue and
more recent builds of Xen do send the unsolicited arp reply after migration.
With a switch, you are quite lucky to have only 2.5s outtime. Depending on
the switch and it''s ageing algorithm, this can be significantly higher,
i.e.,
30s or so.

Do you also have the 2.5s outtime when pinging from a 3rd machine and having
the machines connected by a hub (actually, it''s sufficient to connect
the
Xen machines via a hub to the same switch port)?

Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes
with> it), and applied the Xen related official CentOS (same as
redhat''s) updates.

My experiences are based on the Xen that is shipped with Ubuntu 7.04, which
is also a 3.0.3 and when the machines are connected through a hub, the
unavailability period is within the same orders as in the paper that you
quoted in your first email.

hth
dla

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-10 15:42 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi again,

On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com>
wrote:>
> Hi,
>
> On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:
> >
> > I did ping from a third physical machine. The result doesn''t
vary much.
> >
> > I followed your advice on analyzing the traffic. But I don''t
see why to
> > look for ICMPs, since the DomU does answer the ping, it just has a
2.5sgap after stopping on one machine and starting on the other.
>
>
> Well, I asked you to this (in your original test setup where the ping was
> performed from the source Dom0) in order to see whether the packets are
> actually sent out of the machine, or the Dom0 tries to send it through the
> bridge and the no-longer existing virtual interface.
>
 Oh, I get it. Sorry for not making it clear. You know, I gotta give out
enough info to let people be able to help, and not too much to make it a too
long email and drive people away. :)

Here follows 2 scenarios:>
>
>
>
> That happens when the physical machines are connected to the switch. I
> > started tcpdump on both Dom0''s to see if the DomU would send
the unsolicited
> > arp reply to update the switch''s tables. And there is none.
So, unless there
> > is already traffic going out from the domU, there isn''t
anything to tell the
> > switch the machine changed from one port to another.
>
>
> This is excactly the anticipated behaviour. AFAIR, it is a known issue and
> more recent builds of Xen do send the unsolicited arp reply after
migration.
> With a switch, you are quite lucky to have only 2.5s outtime. Depending on
> the switch and it''s ageing algorithm, this can be significantly
higher,
> i.e., 30s or so.
>
> Do you also have the 2.5s outtime when pinging from a 3rd machine and
> having the machines connected by a hub (actually, it''s sufficient
to connect
> the Xen machines via a hub to the same switch port)?
>
One test I tried was exactly connecting the Xen machines to a hub (a 10mbps
though - the only available) and the hub to the switch. Because of that, I
set the cross connection on the second port, so the migration could be done
quickly. The NICs and switch are 10/100/1000.

The average 2 to 3 seconds downtime still occur with the hub.

Another issue that I described on a previous email (which, unfortunately,
didn''t get any replies) is that this downtime increases to more than 20
seconds if I set the domU''s memory to 512MB (the maxmem set is 1024MB).
I
repeated the test successively, from one side to the other, with mem set to
512 and 1024, and the result was always the same. Around 3s with mem maxmem, and
around 24s with mem=512 and maxmem=1024.

Just to make it clear: the domU is running only an apache server. The cpu,
mem, and net loads are really low and shouldn''t be interfering.

Just to emphasize: I''m running CentOS 5, with Xen 3.0.3 (which comes
with> > it), and applied the Xen related official CentOS (same as
redhat''s) updates.
> >
>
>
> My experiences are based on the Xen that is shipped with Ubuntu 7.04,
> which is also a 3.0.3 and when the machines are connected through a hub,
> the unavailability period is within the same orders as in the paper that
you
> quoted in your first email.
>
>
> hth
> dla
>
>I''m starting to consider that it might be a problem with this
distribution
specifically, although I really don''t see why it should be.

Thanks for the help. Still, any other suggestions or insights are most
welcome.

Marconi.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Luciano Rocha

2007-Aug-10 16:23 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello
wrote:> Another issue that I described on a previous email (which, unfortunately,
> didn''t get any replies) is that this downtime increases to more
than 20
> seconds if I set the domU''s memory to 512MB (the maxmem set is
1024MB). I
> repeated the test successively, from one side to the other, with mem set to
> 512 and 1024, and the result was always the same. Around 3s with mem >
maxmem, and around 24s with mem=512 and maxmem=1024.
> 
You are using the option --live to migrate, aren''t you?

-- 
lfr
0/0


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Stephen Shaw

2007-Aug-10 16:23 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

>>> "Marconi Rivello" <marconirivello@gmail.com>
08/10/07 9:42 AM >>>Hi again,

On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com>
wrote:>
> Hi,
>

It would be interesting, if you were willing, to try something like sles10sp1. 
It runs 3.0.4 with lots of back ports from 3.1.  I realize that this is a lot of
work to redo everything. (maybe you can still use your centos images)

Stephen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-10 17:18 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi, Luciano.

On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org>
wrote:>
> On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote:
> > Another issue that I described on a previous email (which,
> unfortunately,
> > didn''t get any replies) is that this downtime increases to
more than 20
> > seconds if I set the domU''s memory to 512MB (the maxmem set
is 1024MB).
> I
> > repeated the test successively, from one side to the other, with mem
set
> to
> > 512 and 1024, and the result was always the same. Around 3s with mem
> > maxmem, and around 24s with mem=512 and maxmem=1024.
> >
>
> You are using the option --live to migrate, aren''t you?

Yes, I am. :)
Even if I weren''t, it would make sense to expect a lower downtime (or
the
same downtime) by reducing the domU memory. But it takes longer if I reduce
the domU''s memory.

Would you happen to have any ideas on why it behaves like that?

--> lfr
> 0/0
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
>
Thanks,
Marconi.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-10 17:36 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi there Stephen,

On 8/10/07, Stephen Shaw <stshaw@novell.com>
wrote:>
> >>> "Marconi Rivello" <marconirivello@gmail.com>
08/10/07 9:42 AM >>>
> Hi again,
>
> On 8/10/07, mail4dla@googlemail.com <mail4dla@googlemail.com> wrote:
> >
> > Hi,
> >
>
>
> It would be interesting, if you were willing, to try something like
> sles10sp1.  It runs 3.0.4 with lots of back ports from 3.1.  I realize
> that this is a lot of work to redo everything. (maybe you can still use
your
> centos images)
>
> Stephen
>
I downloaded opensuse 10.2 for x86_64. But when I mounted the iso I saw it
comes with 3.0.3. The servers I will be using are isolated from the
internet, so I won''t be able to use yast to update it from the
internet.
Would you happen to know where can I download suse''s official Xen
update
packages, so I can download from my desktop and then transfer it to the
servers? I looked around for a while, and couldn''t find it. I hope
I''m
wrong, but I''m starting to think it might be available only to the
commercial distribution.

Thanks.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Stephen Shaw

2007-Aug-10 18:43 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

>         
>         Stephen
> 
> I downloaded opensuse 10.2 for x86_64. But when I mounted the iso I 
Don''t use opensuse 10.2.  I would say 10.3, but its not ready. 
I''d
recommend running the test with SLES 10 SP 1.  To bad 10.3 wasn''t
finished.


Stephen


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tom Brown

2007-Aug-10 20:25 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

On Fri, 10 Aug 2007, Marconi Rivello wrote:
> Yes, I am. :)
> Even if I weren''t, it would make sense to expect a lower downtime
(or the
> same downtime) by reducing the domU memory. But it takes longer if I reduce
> the domU''s memory.
>
> Would you happen to have any ideas on why it behaves like that?
That might indicate a memory crunch... e.g. memory is actively being 
cycled for disk caching... and since it is being changed, XEN is
having to cycle more often whilst trying to sync the memory
maps, I think the xen logs on dom0 may give you some info. But I''m not 
sure if that logic applies to the actual downtime... it certainly will 
apply to the delay between initiating the migrate command and the 
completion.

If the network is still a possible issue, try running the ping from the 
domU being moved and watching _that_ packet stream from the outside via 
tcpdump or something. That might help cut through the bridges/switches ... 
and if it''s a bridge/switch issue then you''re wasting time
trying to fix
it in XEN (and that would be good to know).

-Tom


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Luciano Rocha

2007-Aug-10 20:31 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello
wrote:> Hi, Luciano.
> 
> On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:
> >
> > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote:
> > > Another issue that I described on a previous email (which,
> > unfortunately,
> > > didn''t get any replies) is that this downtime increases
to more than 20
> > > seconds if I set the domU''s memory to 512MB (the maxmem
set is 1024MB).
> > I
> > > repeated the test successively, from one side to the other, with
mem set
> > to
> > > 512 and 1024, and the result was always the same. Around 3s with
mem > > > maxmem, and around 24s with mem=512 and maxmem=1024.
> > >
> >
> > You are using the option --live to migrate, aren''t you?
> 
> 
> Yes, I am. :)
Oh. Well, then, could you try without? :)

Also, try the reverse. Ping an outside host in the domU.
> Even if I weren''t, it would make sense to expect a lower downtime
(or the
> same downtime) by reducing the domU memory. But it takes longer if I reduce
> the domU''s memory.
That is odd. Is the Dom0 memory the same (ie., fixed)?
> Would you happen to have any ideas on why it behaves like that?
No idea. I might expect a longer migration time for a machine with a
very active working set, but not a much longer downtime. That should be
only a freeze, final sync, and resume on the other side.

-- 
lfr
0/0


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-11 03:57 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org>
wrote:>
> On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote:
> > Hi, Luciano.
> >
> > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:
> > >
> > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello wrote:
> > > > Another issue that I described on a previous email (which,
> > > unfortunately,
> > > > didn''t get any replies) is that this downtime
increases to more than
> 20
> > > > seconds if I set the domU''s memory to 512MB (the
maxmem set is
> 1024MB).
> > > I
> > > > repeated the test successively, from one side to the other,
with mem
> set
> > > to
> > > > 512 and 1024, and the result was always the same. Around 3s
with mem
> > > > > maxmem, and around 24s with mem=512 and maxmem=1024.
> > > >
> > >
> > > You are using the option --live to migrate, aren''t you?
> >
> >
> > Yes, I am. :)
>
> Oh. Well, then, could you try without? :)

I could, but what I''m whining :) about is to have a period of
unresponsiveness of a couple of seconds, instead of a tenth of a second. If
I do a stop-copy-restart migration it will be even longer.

Also, try the reverse. Ping an outside host in the domU.


I will. In fact, I will try all the monitoring suggestions (from you and the
others). Inside domU, outside, third machine, ICMP, ARP...
> Even if I weren''t, it would make sense to expect a lower downtime
(or the
> > same downtime) by reducing the domU memory. But it takes longer if I
> reduce
> > the domU''s memory.
>
> That is odd. Is the Dom0 memory the same (ie., fixed)?
>
> > Would you happen to have any ideas on why it behaves like that?
>
> No idea. I might expect a longer migration time for a machine with a
> very active working set, but not a much longer downtime. That should be
> only a freeze, final sync, and resume on the other side.
>
> --
> lfr
> 0/0
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
>
I would like to thank everyone who contributed with ideas. It was very
helpful. Unfortunately, I will be gone for the next week on a training, and
will only be able to further investigate when I get back to work. When I do,
I will do some more tests and post what I find out or not.

Thanks again,
Marconi.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Wood

2007-Aug-15 18:59 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

I have also noticed the problem that domains take much longer to
migrate if they are set to have memory < maxmem.  I know this was a
problem back in the Xen 3.0.1 days and had thought I had heard that it
was getting fixed with newer versions.

At that time if you looked at one of the xen debug logs there were
millions of lines of errors every time you attempted to migrate a
domain which had its memory image shrunk... I would suggest that you
check if your current setup is also producing this kind of output and
search for that error message -- I know there were some messages on
this mailing list about it in the past, but can''t find them right now.

good luck!

On 8/10/07, Marconi Rivello <marconirivello@gmail.com>
wrote:>
>
> On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:
> > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote:
> > > Hi, Luciano.
> > >
> > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:
> > > >
> > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi Rivello
wrote:
> > > > > Another issue that I described on a previous email
(which,
> > > > unfortunately,
> > > > > didn''t get any replies) is that this downtime
increases to more than
> 20
> > > > > seconds if I set the domU''s memory to 512MB
(the maxmem set is
> 1024MB).
> > > > I
> > > > > repeated the test successively, from one side to the
other, with mem
> set
> > > > to
> > > > > 512 and 1024, and the result was always the same.
Around 3s with mem
> > > > > > maxmem, and around 24s with mem=512 and
maxmem=1024.
> > > > >
> > > >
> > > > You are using the option --live to migrate, aren''t
you?
> > >
> > >
> > > Yes, I am. :)
> >
> > Oh. Well, then, could you try without? :)
>
> I could, but what I''m whining :) about is to have a period of
> unresponsiveness of a couple of seconds, instead of a tenth of a second. If
> I do a stop-copy-restart migration it will be even longer.
>
> > Also, try the reverse. Ping an outside host in the domU.
>
>  I will. In fact, I will try all the monitoring suggestions (from you and
> the others). Inside domU, outside, third machine, ICMP, ARP...
>
> > > Even if I weren''t, it would make sense to expect a lower
downtime (or
> the
> > > same downtime) by reducing the domU memory. But it takes longer
if I
> reduce
> > > the domU''s memory.
> >
> > That is odd. Is the Dom0 memory the same (ie., fixed)?
> >
> > > Would you happen to have any ideas on why it behaves like that?
> >
> > No idea. I might expect a longer migration time for a machine with a
> > very active working set, but not a much longer downtime. That should
be
> > only a freeze, final sync, and resume on the other side.
> >
> > --
> > lfr
> > 0/0
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xensource.com
> > http://lists.xensource.com/xen-users
> >
> >
>
>
> I would like to thank everyone who contributed with ideas. It was very
> helpful. Unfortunately, I will be gone for the next week on a training, and
> will only be able to further investigate when I get back to work. When I
do,
> I will do some more tests and post what I find out or not.
>
> Thanks again,
> Marconi.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Marconi Rivello

2007-Aug-21 18:29 UTC

head link

Re: [Xen-users] Live migration: 2500ms downtime

Hi there,

Following suggestions, I installed SLES 10 SP1, with Xen 3.0.4. Although the
migration downtime diminished, it is still an order of magnitude higher than
what it should be. It is now around 1.2s.

That is still pretty impressive, if it weren''t for the fact that
there''s
still no ARP after the migration, so the switch doesn''t update
it''s tables
with which port the VM is on, isolating it from the outside world.

I measured the downtime pinging from within the VM to an outside host, in
100ms intervals. With the constant pinging, the VM advertises itself to the
switch, and the communication continues after 12 packets lost... But if
there is no activity from the VM, I can only get it back to life pinging it
from the new dom0. That generates an ICMP reply, that goes out the physical
ethernet and advertises it''s new location.

I was looking at the external-migration-tool option in xend-config.sxp, but
can''t figure out how to use it or if it would even be useful to
automatically ping the VM after the migration. I can''t find any
documentation about it or examples.

Any ideas?

Thanks.



On 8/15/07, Tim Wood <twwood@gmail.com> wrote:>
> I have also noticed the problem that domains take much longer to
> migrate if they are set to have memory < maxmem.  I know this was a
> problem back in the Xen 3.0.1 days and had thought I had heard that it
> was getting fixed with newer versions.
>
> At that time if you looked at one of the xen debug logs there were
> millions of lines of errors every time you attempted to migrate a
> domain which had its memory image shrunk... I would suggest that you
> check if your current setup is also producing this kind of output and
> search for that error message -- I know there were some messages on
> this mailing list about it in the past, but can''t find them right
now.
>
> good luck!
>
> On 8/10/07, Marconi Rivello <marconirivello@gmail.com> wrote:
> >
> >
> > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org> wrote:
> > > On Fri, Aug 10, 2007 at 02:18:53PM -0300, Marconi Rivello wrote:
> > > > Hi, Luciano.
> > > >
> > > > On 8/10/07, Luciano Rocha <strange@nsk.no-ip.org>
wrote:
> > > > >
> > > > > On Fri, Aug 10, 2007 at 12:42:08PM -0300, Marconi
Rivello wrote:
> > > > > > Another issue that I described on a previous email
(which,
> > > > > unfortunately,
> > > > > > didn''t get any replies) is that this
downtime increases to more
> than
> > 20
> > > > > > seconds if I set the domU''s memory to
512MB (the maxmem set is
> > 1024MB).
> > > > > I
> > > > > > repeated the test successively, from one side to
the other, with
> mem
> > set
> > > > > to
> > > > > > 512 and 1024, and the result was always the same.
Around 3s with
> mem
> > > > > > > > maxmem, and around 24s with mem=512 and
maxmem=1024.
> > > > > >
> > > > >
> > > > > You are using the option --live to migrate,
aren''t you?
> > > >
> > > >
> > > > Yes, I am. :)
> > >
> > > Oh. Well, then, could you try without? :)
> >
> > I could, but what I''m whining :) about is to have a period of
> > unresponsiveness of a couple of seconds, instead of a tenth of a
second.
> If
> > I do a stop-copy-restart migration it will be even longer.
> >
> > > Also, try the reverse. Ping an outside host in the domU.
> >
> >  I will. In fact, I will try all the monitoring suggestions (from you
> and
> > the others). Inside domU, outside, third machine, ICMP, ARP...
> >
> > > > Even if I weren''t, it would make sense to expect a
lower downtime
> (or
> > the
> > > > same downtime) by reducing the domU memory. But it takes
longer if I
> > reduce
> > > > the domU''s memory.
> > >
> > > That is odd. Is the Dom0 memory the same (ie., fixed)?
> > >
> > > > Would you happen to have any ideas on why it behaves like
that?
> > >
> > > No idea. I might expect a longer migration time for a machine
with a
> > > very active working set, but not a much longer downtime. That
should
> be
> > > only a freeze, final sync, and resume on the other side.
> > >
> > > --
> > > lfr
> > > 0/0
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@lists.xensource.com
> > > http://lists.xensource.com/xen-users
> > >
> > >
> >
> >
> > I would like to thank everyone who contributed with ideas. It was very
> > helpful. Unfortunately, I will be gone for the next week on a
training,
> and
> > will only be able to further investigate when I get back to work. When
I
> do,
> > I will do some more tests and post what I find out or not.
> >
> > Thanks again,
> > Marconi.
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xensource.com
> > http://lists.xensource.com/xen-users
> >
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Reasonably Related Threads

Search for more reasonably related threads

Xen users - Aug 2007 - Live migration: 2500ms downtime

[Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Re: [Xen-users] Live migration: 2500ms downtime

Reasonably Related Threads