Phillip Nkubito
2012-Mar-19 12:21 UTC
Understanding Xen Log Output: Downtime and Total migration time.
Hi, I am looking at the output of my xen logs (using xm log), particularly the outputs pertaining to the migration of VMS. Here is an excerpt from the source host''s logs(dom0); I am showing the first 3 iterations of the VM live migration. ==========================================================1: [2012-03-09 13:37:12 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 1 2: 1: sent 28847, skipped 3921, delta 1702ms, dom0 62%, target 32%, sent 555Mb/s, dirtied 116Mb/s 6036 pages 3: [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 2 4: 2: sent 3462, skipped 2574, delta 201ms, dom0 60%, target 44%, sent 564Mb/s, dirtied 613Mb/s 3766 pages 5: [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 3 6: 3: sent 2701, skipped 1065, delta 178ms, dom0 48%, target 46%, sent 497Mb/s, dirtied 634Mb/s 3447 pages ========================================================== It continues iterating as expected from the pre-copy algorithm until eventually the VM is migrated to the destination host. My question is what is "delta" as it appears in "line 2,4 and 6" above supposed to be in terms of the migration time? For my next question I am showing the last iterations and some logs xen spews out: ==========================================================1: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 12 2: 12: sent 341, skipped 0, delta 12ms, dom0 100%, target 0%, sent 931Mb/s, dirtied 114Mb/s 42 pages 3: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 13 4: 13: sent 42, skipped 0, Start last iteration 5: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:449) suspend 6: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:178) In saveInputHandler suspend 7: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:180) Suspending 4 ... 8: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch 9: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch 10: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] INFO (XendDomainInfo:1292) Domain has shutdown: name=migrating-vm2 id=4 reason=suspend. 11: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:185) Domain 4 suspended. 12: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) SUSPEND shinfo 00000bdc 13: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) delta 11ms, dom0 81%, target 0%, sent 125Mb/s, dirtied 199Mb/s 67 pages 14: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving memory pages: iter 14 15: 14: sent 67, skipped 0, delta 5ms, dom0 100%, target 0%, sent 439Mb/s, dirtied 439Mb/s 67 pages 16: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Total pages sent= 52369 (0.77x) ========================================================== I understand from the pre-copy algorithm that the last stage involves stopping the VM and copying the remaining memory pages across to the destination host. From the above excerpt on "line 15" is delta a reflection of the downtime during this last stop-copy phase? Looking forward to your responses. Kind Regards, Phillip Nkubito
Ian Campbell
2012-Mar-19 16:57 UTC
Re: Understanding Xen Log Output: Downtime and Total migration time.
On Mon, 2012-03-19 at 12:21 +0000, Phillip Nkubito wrote:> Hi, > > I am looking at the output of my xen logs (using xm log), particularly the > outputs pertaining to the migration of VMS. > > Here is an excerpt from the source host''s logs(dom0); I am showing the > first 3 iterations of the VM live migration. > > ==========================================================> 1: [2012-03-09 13:37:12 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 1 > 2: 1: sent 28847, skipped 3921, delta 1702ms, dom0 62%, target 32%, sent > 555Mb/s, dirtied 116Mb/s 6036 pages > > 3: [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 2 > 4: 2: sent 3462, skipped 2574, delta 201ms, dom0 60%, target 44%, sent > 564Mb/s, dirtied 613Mb/s 3766 pages > > 5: [2012-03-09 13:37:13 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 3 > 6: 3: sent 2701, skipped 1065, delta 178ms, dom0 48%, target 46%, sent > 497Mb/s, dirtied 634Mb/s 3447 pages > ==========================================================> > It continues iterating as expected from the pre-copy algorithm until > eventually the VM is migrated to the destination host. > > My question is what is "delta" as it appears in "line 2,4 and 6" above > supposed to be in terms of the migration time?I''m inferring from the code (tools/libxc/xc_domain_save.c, print_stats in particular) that it is the total time taken for the iteration. This is consistent with the first being longer since it needs to send the whole image. Subsequent rounds only send what has changed.> > > For my next question I am showing the last iterations and some logs xen > spews out: > ==========================================================> 1: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 12 > 2: 12: sent 341, skipped 0, delta 12ms, dom0 100%, target 0%, sent > 931Mb/s, dirtied 114Mb/s 42 pages > 3: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 13 > 4: 13: sent 42, skipped 0, Start last iteration > 5: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:449) suspend > 6: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:178) In > saveInputHandler suspend > 7: [2012-03-09 13:37:14 xend 3113] DEBUG (XendCheckpoint:180) Suspending > 4 ... > 8: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG > (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch > 9: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] DEBUG > (XendDomainInfo:1335) XendDomainInfo.handleShutdownWatch > 10: [2012-03-09 13:37:14 xend.XendDomainInfo 3113] INFO > (XendDomainInfo:1292) Domain has shutdown: name=migrating-vm2 id=4 > reason=suspend. > 11: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:185) Domain 4 > suspended. > 12: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) SUSPEND > shinfo 00000bdc > 13: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) delta 11ms, > dom0 81%, target 0%, sent 125Mb/s, dirtied 199Mb/s 67 pages > 14: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Saving > memory pages: iter 14 > 15: 14: sent 67, skipped 0, delta 5ms, dom0 100%, target 0%, sent > 439Mb/s, dirtied 439Mb/s 67 pages > 16: [2012-03-09 13:37:14 xend 3113] INFO (XendCheckpoint:498) Total pages > sent= 52369 (0.77x) > ==========================================================> > I understand from the pre-copy algorithm that the last stage involves > stopping the VM and copying the remaining memory pages across to the > destination host. > > From the above excerpt on "line 15" is delta a reflection of the downtime > during this last stop-copy phase?It (again from code inspection only) counts the final iteration including the transfer of the pages but does not count the "tail" portion of the migrate (e.g. the domain state, which is small compared with the RAM data), nor does it include waiting for the other end to actually start the domain running at the far end. I don''t know of any stat which accurately tracks the precise downtime of the stop and copy. One approach you could use would be to monitor the VM other the network and measure the apparent network outage, since I guess that is likely to be the thing which is most interesting in terms of downtime. Otherwise I think you''d be looking at modifying the code to track this statistic. Ian.