Hi there, We are using latest version of XCP on 6 hosts. While issuing VM.start or VM.start_on xmlrpc functional call , it says : {''Status'': ''Failure'', ''ErrorDescription'': [''SESSION_INVALID'', ''OpaqueRef:cfb6df14-387d-40a1-cc27-d5962cba7712'']} However if I put VM.start in a loop maybe after 20-30 tries it succeeds . But VM.start_on does not succeed even after 70 tries. One more observation - VM.clone succeeds after 7-8 tries VM.hard_shutdown works fine Can you guide me on this issue, Akshay _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
IIRC, there was a fairly nasty session caching bug that could cause issues like this. It''s been fixed since, and the upcoming 0.5 release (coming in a week or so) should fix it. Cheers, Jon On 1 Jun 2010, at 18:49, AkshayKumar Mehta wrote: Hi there, We are using latest version of XCP on 6 hosts. While issuing VM.start or VM.start_on xmlrpc functional call , it says : {''Status'': ''Failure'', ''ErrorDescription'': [''SESSION_INVALID'', ''OpaqueRef:cfb6df14-387d-40a1-cc27-d5962cba7712'']} However if I put VM.start in a loop maybe after 20-30 tries it succeeds . But VM.start_on does not succeed even after 70 tries. One more observation - VM.clone succeeds after 7-8 tries VM.hard_shutdown works fine Can you guide me on this issue, Akshay <ATT00001..txt> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Jonathan Ludlam, Thanks! Let me know how to update to .5 without disturbing the existing configuration Again thanks for your quick reply. Akshay ________________________________ From: Jonathan Ludlam [mailto:Jonathan.Ludlam@eu.citrix.com] Sent: Tuesday, June 01, 2010 12:07 PM To: AkshayKumar Mehta Cc: xen-devel@lists.xensource.com; Pradeep Padala Subject: Re: [Xen-devel] XCP IIRC, there was a fairly nasty session caching bug that could cause issues like this. It''s been fixed since, and the upcoming 0.5 release (coming in a week or so) should fix it. Cheers, Jon On 1 Jun 2010, at 18:49, AkshayKumar Mehta wrote: Hi there, We are using latest version of XCP on 6 hosts. While issuing VM.start or VM.start_on xmlrpc functional call , it says : {''Status'': ''Failure'', ''ErrorDescription'': [''SESSION_INVALID'', ''OpaqueRef:cfb6df14-387d-40a1-cc27-d5962cba7712'']} However if I put VM.start in a loop maybe after 20-30 tries it succeeds . But VM.start_on does not succeed even after 70 tries. One more observation - VM.clone succeeds after 7-8 tries VM.hard_shutdown works fine Can you guide me on this issue, Akshay <ATT00001..txt> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2010-Jun-01 21:17 UTC
[Xen-devel] slow live magration / xc_restore on xen4 pvops
Hi, in preparation for our soon to arrive central storage array i wanted to test live magration and remus replication and stumbled upon a problem. When migrating a test-vm (512megs ram, idle) between my 3 servers two of them are extremely slow in "receiving" the vm. There is little to no cpu utilization from xc_restore until shortly before migration is complete. The same goes for xm restore. The xend.log contains: [2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:286) restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0, [2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:305) [xc_restore]: /usr/lib/xen/bin/xc_restore 48 43 1 2 0 0 0 0 [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) xc_domain_restore start: p2m_size = 20000 [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) Reloading memory pages: 0% [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal error: Error when reading batch size [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal error: error when buffering batch, finishing When receiving a vm via live migration finally finishes. You can see the large gap in the timestamps. The vm is perfectly fine after that, it just takes way too long. First off let me explain my server setup, detailed information on trying to narrow down the error follows. I have 3 servers running xen4 with 2.6.31.13-pvops as kernel, its the current kernel from jeremy''s xen/master git branch. The guests are running vanilla 2.6.32.11 kernels. The 3 servers differ slightly in hardware, two are Dell PE 2950 and one is a Dell R710, the 2950''s have 2 Quad-Xeon CPUs (L5335 and L5410), the R710 has 2 Quad Xeon E5520. All machines have 24gigs of RAM. They are called "tarballerina" (E5520), "xentruio1" (L5335) ad "xenturio2" (L5410). Currently i use tarballerina for testing purposes but i dont consider anything in my setup "stable". xenturio1 has 27 guests running, xenturio2 25. No guest does anything that would even put a dent into the systems performance (ldap servers, radius, department webservers, etc.). I created a test-vm on my current central iscsi storage, called "hatest" that idles around, has 2 VCPUs and 512megs of ram. First i testen xm save/restore: tarballerina:~# time xm restore /var/saverestore-t.mem real 0m13.227s user 0m0.090s sys 0m0.023s xenturio1:~# time xm restore /var/saverestore-x1.mem real 4m15.173s user 0m0.138s sys 0m0.029s When migrating to xenturio1 or 2 it the migration takes 181 to 278 seconds, when migrating it to tarballerina it takes rougly 30seconds: tarballerina:~# time xm migrate --live hatest 10.0.1.98 real 3m57.971s user 0m0.086s sys 0m0.029s xenturio1:~# time xm migrate --live hatest 10.0.1.100 real 0m43.588s user 0m0.123s sys 0m0.034s --- attempt of narrowing it down ---- My first guess was that since tarballerina had almost no guest running that did anything, it could be a issue of memory usage by the tapdisk2 processes (each dom0 has been mem-set to 4096M). I then started almost all vms that i have on tarballerina: tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem real 0m2.884s tarballerina:~# time xm restore /var/saverestore-t.mem real 0m15.594s i tried this several times, sometimes it too 30+ seconds. Then i started 2 VMs that run load and io generating processes (stress, dd, openssl encryption, md5sum). But this didnt affect xm restore perfomance, it still was quite fast: tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem real 0m7.476s user 0m0.101s sys 0m0.022s tarballerina:~# time xm restore /var/saverestore-t.mem real 0m45.544s user 0m0.094s sys 0m0.022s i tried several times again, restore took 17 to 45 seconds Then i tried migrating the test-vm to tarballerina again, still fast, inspite of several vms including load and io generating vms: This ate almost all available ram. cputimes for xc_restore according to target machine''s "top": tarballerina -> xenturio1: 0:05:xx , cpu 2-4%, near the end 40%. xenturio1 > tarballerina: 0:04:xx, cpu 4-8%, near the end 54%. tarballerina:~# time xm migrate --live hatest 10.0.1.98 real 3m29.779s user 0m0.102s sys 0m0.017s xenturio1:~# time xm migrate --live hatest 10.0.1.100 real 0m28.386s user 0m0.154s sys 0m0.032s so my attempt of narrowing the problem down failed, its neither the free memory of the dom0 nor the load, io or the memory the other domUs utilize. ---end attempt--- More info(xm list, meminfo, table with migration times, etc.) on my setup can be found here: http://andiolsi.rz.uni-lueneburg.de/node/37 There was another guy who has the same error in his logfile, this might be unrelated or not: http://lists.xensource.com/archives/html/xen-users/2010-05/msg00318.html Further information can be given, should demand for i arise. With best regards --- Andreas Olsowski <andreas.olsowski@uni.leuphana.de> Leuphana Universität Lüneburg System- und Netzwerktechnik Rechenzentrum, Geb 7, Raum 15 Scharnhorststr. 1 21335 Lüneburg Tel: ++49 4131 / 6771309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 07:11 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Hi Andreas, This is an interesting bug, to be sure. I think you need to modify the restore code to get a better idea of what''s going on. The file in the Xen tree is tools/libxc/xc_domain_restore.c. You will see it contains many DBGPRINTF and DPRINTF calls, some of which are commented out, and some of which may ''log'' at too low a priority level to make it to the log file. For your purposes you might change them to ERROR calls as they will definitely get properly logged. One area of possible concern is that our read function (RDEXACT, which is a macro mapping to rdexact) was modified for Remus to have a select() call with a timeout of 1000ms. Do I entirely trust it? Not when we have the inexplicable behaviour that you''re seeing. So you might try mapping RDEXACT() to read_exact() instead (which is what we already do when building for __MINIOS__). This all assumes you know your way around C code at least a little bit. -- Keir On 01/06/2010 22:17, "Andreas Olsowski" <andreas.olsowski@uni.leuphana.de> wrote:> Hi, > > in preparation for our soon to arrive central storage array i wanted to > test live magration and remus replication and stumbled upon a problem. > When migrating a test-vm (512megs ram, idle) between my 3 servers two of > them are extremely slow in "receiving" the vm. There is little to no cpu > utilization from xc_restore until shortly before migration is complete. > The same goes for xm restore. > The xend.log contains: > [2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:286) > restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0, > [2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:305) [xc_restore]: > /usr/lib/xen/bin/xc_restore 48 43 1 2 0 0 0 0 > [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) xc_domain_restore > start: p2m_size = 20000 > [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) Reloading memory > pages: 0% > [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > error: Error when reading batch size > [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > error: error when buffering batch, finishing > > When receiving a vm via live migration finally finishes. You can see the > large gap in the timestamps. > The vm is perfectly fine after that, it just takes way too long. > > > First off let me explain my server setup, detailed information on trying > to narrow down the error follows. > I have 3 servers running xen4 with 2.6.31.13-pvops as kernel, its the > current kernel from jeremy''s xen/master git branch. > The guests are running vanilla 2.6.32.11 kernels. > > The 3 servers differ slightly in hardware, two are Dell PE 2950 and one > is a Dell R710, the 2950''s have 2 Quad-Xeon CPUs (L5335 and L5410), the > R710 has 2 Quad Xeon E5520. > All machines have 24gigs of RAM. > > They are called "tarballerina" (E5520), "xentruio1" (L5335) ad > "xenturio2" (L5410). > > Currently i use tarballerina for testing purposes but i dont consider > anything in my setup "stable". > xenturio1 has 27 guests running, xenturio2 25. > No guest does anything that would even put a dent into the systems > performance (ldap servers, radius, department webservers, etc.). > > I created a test-vm on my current central iscsi storage, called "hatest" > that idles around, has 2 VCPUs and 512megs of ram. > > First i testen xm save/restore: > tarballerina:~# time xm restore /var/saverestore-t.mem > real 0m13.227s > user 0m0.090s > sys 0m0.023s > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 4m15.173s > user 0m0.138s > sys 0m0.029s > > > When migrating to xenturio1 or 2 it the migration takes 181 to 278 > seconds, when migrating it to tarballerina it takes rougly 30seconds: > tarballerina:~# time xm migrate --live hatest 10.0.1.98 > real 3m57.971s > user 0m0.086s > sys 0m0.029s > xenturio1:~# time xm migrate --live hatest 10.0.1.100 > real 0m43.588s > user 0m0.123s > sys 0m0.034s > > > --- attempt of narrowing it down ---- > My first guess was that since tarballerina had almost no guest running > that did anything, it could be a issue of memory usage by the tapdisk2 > processes (each dom0 has been mem-set to 4096M). > I then started almost all vms that i have on tarballerina: > tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem > real 0m2.884s > tarballerina:~# time xm restore /var/saverestore-t.mem > real 0m15.594s > > > i tried this several times, sometimes it too 30+ seconds. > > Then i started 2 VMs that run load and io generating processes (stress, > dd, openssl encryption, md5sum). > But this didnt affect xm restore perfomance, it still was quite fast: > tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem > real 0m7.476s > user 0m0.101s > sys 0m0.022s > tarballerina:~# time xm restore /var/saverestore-t.mem > real 0m45.544s > user 0m0.094s > sys 0m0.022s > > i tried several times again, restore took 17 to 45 seconds > > Then i tried migrating the test-vm to tarballerina again, still fast, > inspite of several vms including load and io generating vms: > This ate almost all available ram. > cputimes for xc_restore according to target machine''s "top": > tarballerina -> xenturio1: 0:05:xx , cpu 2-4%, near the end 40%. > xenturio1 > tarballerina: 0:04:xx, cpu 4-8%, near the end 54%. > > tarballerina:~# time xm migrate --live hatest 10.0.1.98 > real 3m29.779s > user 0m0.102s > sys 0m0.017s > xenturio1:~# time xm migrate --live hatest 10.0.1.100 > real 0m28.386s > user 0m0.154s > sys 0m0.032s > > > so my attempt of narrowing the problem down failed, its neither the free > memory of the dom0 nor the load, io or the memory the other domUs utilize. > ---end attempt--- > > More info(xm list, meminfo, table with migration times, etc.) on my > setup can be found here: > http://andiolsi.rz.uni-lueneburg.de/node/37 > > There was another guy who has the same error in his logfile, this might > be unrelated or not: > http://lists.xensource.com/archives/html/xen-users/2010-05/msg00318.html > > Further information can be given, should demand for i arise. > > With best regards > > --- > Andreas Olsowski <andreas.olsowski@uni.leuphana.de> > Leuphana Universität Lüneburg > System- und Netzwerktechnik > Rechenzentrum, Geb 7, Raum 15 > Scharnhorststr. 1 > 21335 Lüneburg > > Tel: ++49 4131 / 6771309 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2010-Jun-02 15:46 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Hi Keir, i changed all DRPRINTF calls to ERROR and // DPRINTF to ERROR as well. There are no DBGPRINTF calls in my xc_domain_restore.c though. This is the new xend.log output, of course in this case the "ERROR Internal error:" is actually debug output. xenturio1:~# tail -f /var/log/xen/xend.log [2010-06-02 15:44:19 5468] DEBUG (XendCheckpoint:286) restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0, [2010-06-02 15:44:19 5468] DEBUG (XendCheckpoint:305) [xc_restore]: /usr/lib/xen/bin/xc_restore 50 51 1 2 0 0 0 0 [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: xc_domain_restore start: p2m_size = 20000 [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: Reloading memory pages: 0% [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: reading batch of -7 pages [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) ERROR Internal error: reading batch of 1024 pages [2010-06-02 15:44:19 5468] INFO (XendCheckpoint:423) [2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) ERROR Internal error: reading batch of 1024 pages [2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) [2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) ERROR Internal error: reading batch of 1024 pages [2010-06-02 15:49:02 5468] INFO (XendCheckpoint:423) [2010-06-02 15:49:03 5468] INFO (XendCheckpoint:423) ERROR Internal error: reading batch of 1024 pages ... [2010-06-02 15:49:09 5468] INFO (XendCheckpoint:423) ERROR Internal err100% ... One can see the timegap bewteen the first and the following memory batch reads. After that restoration works as expected. You might notice, that you have "0%" and then "100%" and no steps inbetween, whereas with xc_save you have, is that intentional or maybe another symptom for the same problem? as for the read_exact stuff: tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H RDEXACT {} \; tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H rdexact {} \; There are no RDEXACT/rdexact matches in my xen source code. In a few hours i will shutdown all virtual machines on one of the hosts experiencing slow xc_restores, maybe reboot it and check if xc_restore is any faster without load or utilization on the machine. Ill check in with results later. On Wed, 2 Jun 2010 08:11:31 +0100 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> Hi Andreas, > > This is an interesting bug, to be sure. I think you need to modify the > restore code to get a better idea of what''s going on. The file in the Xen > tree is tools/libxc/xc_domain_restore.c. You will see it contains many > DBGPRINTF and DPRINTF calls, some of which are commented out, and some of > which may ''log'' at too low a priority level to make it to the log file. For > your purposes you might change them to ERROR calls as they will definitely > get properly logged. One area of possible concern is that our read function > (RDEXACT, which is a macro mapping to rdexact) was modified for Remus to > have a select() call with a timeout of 1000ms. Do I entirely trust it? Not > when we have the inexplicable behaviour that you''re seeing. So you might try > mapping RDEXACT() to read_exact() instead (which is what we already do when > building for __MINIOS__). > > This all assumes you know your way around C code at least a little bit. > > -- Keir-- Andreas Olsowski <andreas.olsowski@uni.leuphana.de> Leuphana Universität Lüneburg System- und Netzwerktechnik Rechenzentrum, Geb 7, Raum 15 Scharnhorststr. 1 21335 Lüneburg Tel: ++49 4131 / 6771309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 15:55 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 02/06/2010 16:46, "Andreas Olsowski" <andreas.olsowski@uni.leuphana.de> wrote:> One can see the timegap bewteen the first and the following memory batch > reads. > After that restoration works as expected. > You might notice, that you have "0%" and then "100%" and no steps inbetween, > whereas with xc_save you have, is that intentional or maybe another symptom > for the same problem?Does the log look similar for a restore on a fast system (except the timestamps of course)?> as for the read_exact stuff: > tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H > RDEXACT {} \; > tarballerina:/usr/src/xen-4.0.0# find . -type f -iname \*.c -exec grep -H > rdexact {} \; > > There are no RDEXACT/rdexact matches in my xen source code.Ah, because you''re using 4.0. Well, I wouldn''t worry about it just now anyway. It may be more fruitful to continue looking for a concrete behavioural different between a fast and slow restore, apart from merely timing, by inspecting logs. -- Keir> In a few hours i will shutdown all virtual machines on one of the hosts > experiencing slow xc_restores, maybe reboot it and check if xc_restore is any > faster without load or utilization on the machine. > > Ill check in with results later. > > > On Wed, 2 Jun 2010 08:11:31 +0100 > Keir Fraser <keir.fraser@eu.citrix.com> wrote: > >> Hi Andreas, >> >> This is an interesting bug, to be sure. I think you need to modify the >> restore code to get a better idea of what''s going on. The file in the Xen >> tree is tools/libxc/xc_domain_restore.c. You will see it contains many >> DBGPRINTF and DPRINTF calls, some of which are commented out, and some of >> which may ''log'' at too low a priority level to make it to the log file. For >> your purposes you might change them to ERROR calls as they will definitely >> get properly logged. One area of possible concern is that our read function >> (RDEXACT, which is a macro mapping to rdexact) was modified for Remus to >> have a select() call with a timeout of 1000ms. Do I entirely trust it? Not >> when we have the inexplicable behaviour that you''re seeing. So you might try >> mapping RDEXACT() to read_exact() instead (which is what we already do when >> building for __MINIOS__). >> >> This all assumes you know your way around C code at least a little bit. >> >> -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jun-02 16:18 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4 pvops"):> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > error: Error when reading batch size > [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > error: error when buffering batch, finishingThese errors, and the slowness of migrations, are caused by changes made to support Remus. Previously, a migration would be regarded as complete as soon as the final information including CPU states was received at the migration target. xc_domain_restore would return immediately at that point. Since the Remus patches, xc_domain_restore waits until it gets an IO error, and also has a very short timeout which induces IO errors if nothing is received if there is no timeout. This is correct in the Remus case but wrong in the normal case. The code should be changed so that xc_domain_restore (a) takes an explicit parameter for the IO timeout, which should default to something much longer than the 100ms or so of the Remus case, and (b) gets told whether (i) it should return immediately after receiving the "tail" which contains the CPU state; or (ii) it should attempt to keep reading after receiving the "tail" and only return when the connection fails. In the case (b)(i), which should be the usual case, the behaviour should be that which we would get if changeset 20406:0f893b8f7c15 was reverted. The offending code is mostly this, from 20406: + // DPRINTF("Buffered checkpoint\n"); + + if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) { + ERROR("error when buffering batch, finishing\n"); + goto finish; + } + memset(&tmptail, 0, sizeof(tmptail)); + if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap, + ext_vcpucontext) < 0 ) { + ERROR ("error buffering image tail, finishing"); + goto finish; + } + tailbuf_free(&tailbuf); + memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); + + goto loadpages; + + finish: Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jun-02 16:20 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
I wrote:> These errors, and the slowness of migrations, [...]Actually looking at your log it has a 4min30 delay in it which is quite striking and well beyond the kind of delay which ought to occur due to the problem I just wrote about. Does the same problem happen with xl ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-02 16:24 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 02/06/2010 17:18, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:> Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4 > pvops"): >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal >> error: Error when reading batch size >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal >> error: error when buffering batch, finishing > > These errors, and the slowness of migrations, are caused by changes > made to support Remus. Previously, a migration would be regarded as > complete as soon as the final information including CPU states was > received at the migration target. xc_domain_restore would return > immediately at that point.This probably needs someone with Remus knowledge to take a look, to keep all cases working correctly. I''ll Cc Brendan. It''d be good to get this fixed for a 4.0.1 in a few weeks. -- Keir> Since the Remus patches, xc_domain_restore waits until it gets an IO > error, and also has a very short timeout which induces IO errors if > nothing is received if there is no timeout. This is correct in the > Remus case but wrong in the normal case. > > The code should be changed so that xc_domain_restore > (a) takes an explicit parameter for the IO timeout, which > should default to something much longer than the 100ms or so of > the Remus case, and > (b) gets told whether > (i) it should return immediately after receiving the "tail" > which contains the CPU state; or > (ii) it should attempt to keep reading after receiving the "tail" > and only return when the connection fails. > > In the case (b)(i), which should be the usual case, the behaviour > should be that which we would get if changeset 20406:0f893b8f7c15 was > reverted. The offending code is mostly this, from 20406: > > + // DPRINTF("Buffered checkpoint\n"); > + > + if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) { > + ERROR("error when buffering batch, finishing\n"); > + goto finish; > + } > + memset(&tmptail, 0, sizeof(tmptail)); > + if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap, > + ext_vcpucontext) < 0 ) { > + ERROR ("error buffering image tail, finishing"); > + goto finish; > + } > + tailbuf_free(&tailbuf); > + memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > + > + goto loadpages; > + > + finish: > > Ian. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-02 16:27 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Wednesday, 02 June 2010 at 17:18, Ian Jackson wrote:> Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4 pvops"): > > [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > > error: Error when reading batch size > > [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > > error: error when buffering batch, finishing > > These errors, and the slowness of migrations, are caused by changes > made to support Remus. Previously, a migration would be regarded as > complete as soon as the final information including CPU states was > received at the migration target. xc_domain_restore would return > immediately at that point. > > Since the Remus patches, xc_domain_restore waits until it gets an IO > error, and also has a very short timeout which induces IO errors if > nothing is received if there is no timeout. This is correct in the > Remus case but wrong in the normal case. > > The code should be changed so that xc_domain_restore > (a) takes an explicit parameter for the IO timeout, which > should default to something much longer than the 100ms or so of > the Remus case, and > (b) gets told whether > (i) it should return immediately after receiving the "tail" > which contains the CPU state; or > (ii) it should attempt to keep reading after receiving the "tail" > and only return when the connection fails.I''m going to have a look at this today, but the way the code was originally written I don''t believe this should have been a problem: 1. reads are only supposed to be able to time out after the entire first checkpoint has been received (IOW this wouldn''t kick in until normal migration had already completed) 2. in normal migration, the sender should close the fd after sending all data, immediately triggering an IO error on the receiver and completing the restore. I did try to avoid disturbing regular live migration as much as possible when I wrote the code. I suspect some other regression has crept in, and I''ll investigate. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2010-Jun-02 22:59 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
I did some further research now and shut down all virtual machines on xenturio1, after that i got (3 runs): (xm save takes ~5 seconds , user and sys are always negligible so i removed those to reduce text) xenturio1:~# time xm restore /var/saverestore-x1.mem real 0m25.349s 0m27.456s 0m27.208s So the fact that there were running machines did impact performance of xc_restore. I proceeded to start create 20 "dummy" vms with 1gig ram and 4vpcus each (dom0 has 4096M fixed, 24gig total available): xenturio1:~# for i in {1..20} ; do echo creating dummy$i ; xt vm create dummy$i -vlan 27 -mem 1024 -cpus 4 ; done creating dummy1 vm/create> successfully created vm ''dummy1'' .... creating dummy20 vm/create> successfully created vm ''dummy20'' and started them for i in {1..20} ; do echo starting dummy$i ; xm start dummy$i ; done So my memory allocation should now be 100% (4gig dom0 20gig domUs), but why did i have 512megs to spare for "saverestore-x1"? Oh well, onwards. Once again i ran a save/restore, 3 times to be sure (edited the additional results in output). With 20 running vms: xenturio1:~# time xm restore /var/saverestore-x1.mem real 1m16.375s 0m31.306s 1m10.214s With 16 running vms: xenturio1:~# time xm restore /var/saverestore-x1.mem real 1m49.741s 1m38.696s 0m55.615s With 12 running vms: xenturio1:~# time xm restore /var/saverestore-x1.mem real 1m3.101s 2m4.254s 1m27.193s With 8 running vms: xenturio1:~# time xm restore /var/saverestore-x1.mem real 0m36.867s 0m43.513s 0m33.199s With 4 running vms: xenturio1:~# time xm restore /var/saverestore-x1.mem real 0m40.454s 0m44.929s 1m7.215s Keep in mind, those dumUs dont do anything at all, they just idle. What is going on there the results seem completely random, running more domUs can be faster than running less? How is that even possible? So i deleted the dummyXs and started the productive domUs again, in 3 steps to take further measurements: after first batch: xenturio1:~# time xm restore /var/saverestore-x1.mem real 0m23.968s 1m22.133s 1m24.420s after second batch: xenturio1:~# time xm restore /var/saverestore-x1.mem real 1m54.310s 1m11.340s 1m47.643s after third batch: xenturio1:~# time xm restore /var/saverestore-x1.mem real 1m52.065s 1m34.517s 2m8.644s 1m25.473s 1m35.943s 1m45.074s 1m48.407s 1m18.277s 1m18.931s 1m27.458s So my current guess is, xc_restore speed depends on the amount of used memory or rather how much is beeing grabbed by running processes. Does that make any sense? But if that is so, explain: I started 3 vms running "stress" that do: load average: 30.94, 30.04, 21.00 Mem: 5909844k total, 4020480k used, 1889364k free, 288k buffers But still: tarballerina:~# time xm restore /var/saverestore-t.mem real 0m38.654s Why doesnt xc_restore slow down on tarballerina, no matter what i do? Again: all 3 machines have 24gigs ram, 2x quad xeons and dom0 is fixed to 4096M ram. all use the same xen4 sources, the same kernels with the same configs. Is the Xeon E5520 with DDR3 really this much faster than the L5335 and L5410 with DDR2? If someone were to tell me, that this is expected behaviour i wouldnt like it, but at least i could accept it. Are machines doing plenty of cpu and memory utilizaton not a good measurement in this or any case? I think tomorrow night i will migrate all machines from xenturio1 to tarballerina, but first i have to verify that all vlans are available, that i cannot do right now. --- Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 01:04 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Wednesday, 02 June 2010 at 17:24, Keir Fraser wrote:> On 02/06/2010 17:18, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote: > > > Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4 > > pvops"): > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > >> error: Error when reading batch size > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > >> error: error when buffering batch, finishing > > > > These errors, and the slowness of migrations, are caused by changes > > made to support Remus. Previously, a migration would be regarded as > > complete as soon as the final information including CPU states was > > received at the migration target. xc_domain_restore would return > > immediately at that point. > > This probably needs someone with Remus knowledge to take a look, to keep all > cases working correctly. I''ll Cc Brendan. It''d be good to get this fixed for > a 4.0.1 in a few weeks.I''ve done a bit of profiling of the restore code and observed the slowness here too. It looks to me like it''s probably related to superpage changes. The big hit appears to be at the front of the restore process during calls to allocate_mfn_list, under the normal_page case. It looks like we''re calling xc_domain_memory_populate_physmap once per page here, instead of batching the allocation? I haven''t had time to investigate further today, but I think this is the culprit.> > -- Keir > > > Since the Remus patches, xc_domain_restore waits until it gets an IO > > error, and also has a very short timeout which induces IO errors if > > nothing is received if there is no timeout. This is correct in the > > Remus case but wrong in the normal case. > > > > The code should be changed so that xc_domain_restore > > (a) takes an explicit parameter for the IO timeout, which > > should default to something much longer than the 100ms or so of > > the Remus case, and > > (b) gets told whether > > (i) it should return immediately after receiving the "tail" > > which contains the CPU state; or > > (ii) it should attempt to keep reading after receiving the "tail" > > and only return when the connection fails. > > > > In the case (b)(i), which should be the usual case, the behaviour > > should be that which we would get if changeset 20406:0f893b8f7c15 was > > reverted. The offending code is mostly this, from 20406: > > > > + // DPRINTF("Buffered checkpoint\n"); > > + > > + if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) { > > + ERROR("error when buffering batch, finishing\n"); > > + goto finish; > > + } > > + memset(&tmptail, 0, sizeof(tmptail)); > > + if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap, > > + ext_vcpucontext) < 0 ) { > > + ERROR ("error buffering image tail, finishing"); > > + goto finish; > > + } > > + tailbuf_free(&tailbuf); > > + memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > > + > > + goto loadpages; > > + > > + finish: > > > > Ian. > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Jonathan Ludlam, Is it possible for me to download from repository - Unstable version is fine till the release comes by. Let me know the steps/link to repository. Akshay ________________________________ From: Jonathan Ludlam [mailto:Jonathan.Ludlam@eu.citrix.com] Sent: Tuesday, June 01, 2010 12:07 PM To: AkshayKumar Mehta Cc: xen-devel@lists.xensource.com; Pradeep Padala Subject: Re: [Xen-devel] XCP IIRC, there was a fairly nasty session caching bug that could cause issues like this. It''s been fixed since, and the upcoming 0.5 release (coming in a week or so) should fix it. Cheers, Jon On 1 Jun 2010, at 18:49, AkshayKumar Mehta wrote: Hi there, We are using latest version of XCP on 6 hosts. While issuing VM.start or VM.start_on xmlrpc functional call , it says : {''Status'': ''Failure'', ''ErrorDescription'': [''SESSION_INVALID'', ''OpaqueRef:cfb6df14-387d-40a1-cc27-d5962cba7712'']} However if I put VM.start in a loop maybe after 20-30 tries it succeeds . But VM.start_on does not succeed even after 70 tries. One more observation - VM.clone succeeds after 7-8 tries VM.hard_shutdown works fine Can you guide me on this issue, Akshay <ATT00001..txt> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 04:31 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Wednesday, 02 June 2010 at 18:04, Brendan Cully wrote:> On Wednesday, 02 June 2010 at 17:24, Keir Fraser wrote: > > On 02/06/2010 17:18, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote: > > > > > Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4 > > > pvops"): > > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > > >> error: Error when reading batch size > > >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal > > >> error: error when buffering batch, finishing > > > > > > These errors, and the slowness of migrations, are caused by changes > > > made to support Remus. Previously, a migration would be regarded as > > > complete as soon as the final information including CPU states was > > > received at the migration target. xc_domain_restore would return > > > immediately at that point. > > > > This probably needs someone with Remus knowledge to take a look, to keep all > > cases working correctly. I''ll Cc Brendan. It''d be good to get this fixed for > > a 4.0.1 in a few weeks. > > I''ve done a bit of profiling of the restore code and observed the > slowness here too. It looks to me like it''s probably related to > superpage changes. The big hit appears to be at the front of the > restore process during calls to allocate_mfn_list, under the > normal_page case. It looks like we''re calling > xc_domain_memory_populate_physmap once per page here, instead of > batching the allocation? I haven''t had time to investigate further > today, but I think this is the culprit.By the way, this only seems to matter on pvops -- restore is still pretty quick on 2.6.18. I''m somewhat surprised that there''d be any significant difference in allocating guest memory between the two kernels (isn''t this almost entirely Xen''s responsibility?), but it does explain why this wasn''t noticed until recently.> > > > > > > Since the Remus patches, xc_domain_restore waits until it gets an IO > > > error, and also has a very short timeout which induces IO errors if > > > nothing is received if there is no timeout. This is correct in the > > > Remus case but wrong in the normal case. > > > > > > The code should be changed so that xc_domain_restore > > > (a) takes an explicit parameter for the IO timeout, which > > > should default to something much longer than the 100ms or so of > > > the Remus case, and > > > (b) gets told whether > > > (i) it should return immediately after receiving the "tail" > > > which contains the CPU state; or > > > (ii) it should attempt to keep reading after receiving the "tail" > > > and only return when the connection fails. > > > > > > In the case (b)(i), which should be the usual case, the behaviour > > > should be that which we would get if changeset 20406:0f893b8f7c15 was > > > reverted. The offending code is mostly this, from 20406: > > > > > > + // DPRINTF("Buffered checkpoint\n"); > > > + > > > + if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) { > > > + ERROR("error when buffering batch, finishing\n"); > > > + goto finish; > > > + } > > > + memset(&tmptail, 0, sizeof(tmptail)); > > > + if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap, > > > + ext_vcpucontext) < 0 ) { > > > + ERROR ("error buffering image tail, finishing"); > > > + goto finish; > > > + } > > > + tailbuf_free(&tailbuf); > > > + memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > > > + > > > + goto loadpages; > > > + > > > + finish: > > > > > > Ian. > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-03 05:47 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote:> I''ve done a bit of profiling of the restore code and observed the > slowness here too. It looks to me like it''s probably related to > superpage changes. The big hit appears to be at the front of the > restore process during calls to allocate_mfn_list, under the > normal_page case. It looks like we''re calling > xc_domain_memory_populate_physmap once per page here, instead of > batching the allocation? I haven''t had time to investigate further > today, but I think this is the culprit.Ccing Edwin Zhai. He wrote the superpage logic for domain restore. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 06:45 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote:> On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote: > > > I''ve done a bit of profiling of the restore code and observed the > > slowness here too. It looks to me like it''s probably related to > > superpage changes. The big hit appears to be at the front of the > > restore process during calls to allocate_mfn_list, under the > > normal_page case. It looks like we''re calling > > xc_domain_memory_populate_physmap once per page here, instead of > > batching the allocation? I haven''t had time to investigate further > > today, but I think this is the culprit. > > Ccing Edwin Zhai. He wrote the superpage logic for domain restore.Here''s some data on the slowdown going from 2.6.18 to pvops dom0: I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable to measure the time to do the allocation. kernel, min call time, max call time 2.6.18, 4 us, 72 us pvops, 202 us, 10696 us (!) It looks like pvops is dramatically slower to perform the xc_domain_memory_populate_physmap call! I''ll attach the patch and raw data below. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jun-03 06:53 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 06/02/2010 11:45 PM, Brendan Cully wrote:> On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote: > >> On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote: >> >> >>> I''ve done a bit of profiling of the restore code and observed the >>> slowness here too. It looks to me like it''s probably related to >>> superpage changes. The big hit appears to be at the front of the >>> restore process during calls to allocate_mfn_list, under the >>> normal_page case. It looks like we''re calling >>> xc_domain_memory_populate_physmap once per page here, instead of >>> batching the allocation? I haven''t had time to investigate further >>> today, but I think this is the culprit. >>> >> Ccing Edwin Zhai. He wrote the superpage logic for domain restore. >> > Here''s some data on the slowdown going from 2.6.18 to pvops dom0: > > I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable > to measure the time to do the allocation. > > kernel, min call time, max call time > 2.6.18, 4 us, 72 us > pvops, 202 us, 10696 us (!) > > It looks like pvops is dramatically slower to perform the > xc_domain_memory_populate_physmap call! >That appears to be implemented as a raw hypercall, so the kernel has very little to do with it. The only thing I can see there that might be relevent is that the mlock hypercalls could be slow for some reason? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 06:55 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Wednesday, 02 June 2010 at 23:45, Brendan Cully wrote:> On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote: > > On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote: > > > > > I''ve done a bit of profiling of the restore code and observed the > > > slowness here too. It looks to me like it''s probably related to > > > superpage changes. The big hit appears to be at the front of the > > > restore process during calls to allocate_mfn_list, under the > > > normal_page case. It looks like we''re calling > > > xc_domain_memory_populate_physmap once per page here, instead of > > > batching the allocation? I haven''t had time to investigate further > > > today, but I think this is the culprit. > > > > Ccing Edwin Zhai. He wrote the superpage logic for domain restore. > > Here''s some data on the slowdown going from 2.6.18 to pvops dom0: > > I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable > to measure the time to do the allocation. > > kernel, min call time, max call time > 2.6.18, 4 us, 72 us > pvops, 202 us, 10696 us (!) > > It looks like pvops is dramatically slower to perform the > xc_domain_memory_populate_physmap call!Looking at changeset 20841: Allow certain performance-critical hypercall wrappers to register data buffers via a new interface which allows them to be ''bounced'' into a pre-mlock''ed page-sized per-thread data area. This saves the cost of mlock/munlock on every such hypercall, which can be very expensive on modern kernels. ...maybe the lock_pages call in xc_memory_op (called from xc_domain_memory_populate_physmap) has gotten very expensive? Especially considering this hypercall is now issued once per page. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-03 07:12 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 03/06/2010 07:55, "Brendan Cully" <Brendan@cs.ubc.ca> wrote:>> kernel, min call time, max call time >> 2.6.18, 4 us, 72 us >> pvops, 202 us, 10696 us (!) >> >> It looks like pvops is dramatically slower to perform the >> xc_domain_memory_populate_physmap call! > > Looking at changeset 20841: > > Allow certain performance-critical hypercall wrappers to register data > buffers via a new interface which allows them to be ''bounced'' into a > pre-mlock''ed page-sized per-thread data area. This saves the cost of > mlock/munlock on every such hypercall, which can be very expensive on > modern kernels. > > ...maybe the lock_pages call in xc_memory_op (called from > xc_domain_memory_populate_physmap) has gotten very expensive? > Especially considering this hypercall is now issued once per page.Maybe there are two issues here then. I mean, there''s slow, and there''s 10ms for a presumably in-core kernel operation, which is rather mad. Getting our batching back for 4k allocations is the most critical thing though, of course. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhai, Edwin
2010-Jun-03 08:58 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
I assume this is PV domU rather than HVM, right? 1. we need check if super page is the culprit by SP_check1.patch. 2. if this can fix this problem, we need further check where the extra costs comes: the speculative algorithm, or the super page population hypercall by SP_check2.patch If SP_check2.patch works, the culprit is the new allocation hypercall(so guest creation also suffer); Else, the speculative algorithm. Does it make sense? Thanks, edwin Brendan Cully wrote:> On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote: > >> On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote: >> >> >>> I''ve done a bit of profiling of the restore code and observed the >>> slowness here too. It looks to me like it''s probably related to >>> superpage changes. The big hit appears to be at the front of the >>> restore process during calls to allocate_mfn_list, under the >>> normal_page case. It looks like we''re calling >>> xc_domain_memory_populate_physmap once per page here, instead of >>> batching the allocation? I haven''t had time to investigate further >>> today, but I think this is the culprit. >>> >> Ccing Edwin Zhai. He wrote the superpage logic for domain restore. >> > > Here''s some data on the slowdown going from 2.6.18 to pvops dom0: > > I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable > to measure the time to do the allocation. > > kernel, min call time, max call time > 2.6.18, 4 us, 72 us > pvops, 202 us, 10696 us (!) > > It looks like pvops is dramatically slower to perform the > xc_domain_memory_populate_physmap call! > > I''ll attach the patch and raw data below. >-- best rgds, edwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jun-03 10:01 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"):> 2. in normal migration, the sender should close the fd after sending > all data, immediately triggering an IO error on the receiver and > completing the restore.This is not true. In normal migration, the fd is used by the machinery which surrounds xc_domain_restore (in xc_save and also in xl or xend). In any case it would be quite wrong for a library function like xc_domain_restore to eat the fd. It''s not necessary for xc_domain_restore to behave this way in all cases; all that''s needed is parameters to tell it how to behave.> I did try to avoid disturbing regular live migration as much as > possible when I wrote the code. I suspect some other regression has > crept in, and I''ll investigate.The short timeout is another regression. A normal live migration or restore should not fall over just because no data is available for 100ms. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
There aren''t any binaries at the moment, but you could build your own from the public repositories - I sent a script out on the 25th March that will compile everything up. You''ll need to replace the xapi and fe executables (xapi is output in xen-api.hg/ocaml/xapi/ and fe is found in xen-api-libs.hg/forking_executioner/) I''ll reattach the script to this mail. Cheers, Jon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 15:03 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Thursday, 03 June 2010 at 11:01, Ian Jackson wrote:> Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"): > > 2. in normal migration, the sender should close the fd after sending > > all data, immediately triggering an IO error on the receiver and > > completing the restore. > > This is not true. In normal migration, the fd is used by the > machinery which surrounds xc_domain_restore (in xc_save and also in xl > or xend). In any case it would be quite wrong for a library function > like xc_domain_restore to eat the fd.The sender closes the fd, as it always has. xc_domain_restore has always consumed the entire contents of the fd, because the qemu tail has no length header under normal migration. There''s no behavioral difference here that I can see.> It''s not necessary for xc_domain_restore to behave this way in all > cases; all that''s needed is parameters to tell it how to behave.I have no objection to a more explicit interface. The current form is simply Remus trying to be as invisible as possible to the rest of the tool stack.> > I did try to avoid disturbing regular live migration as much as > > possible when I wrote the code. I suspect some other regression has > > crept in, and I''ll investigate. > > The short timeout is another regression. A normal live migration or > restore should not fall over just because no data is available for > 100ms.(the timeout is 1s, by the way). For some reason you clipped the bit of my previous message where I say this doesn''t happen: 1. reads are only supposed to be able to time out after the entire first checkpoint has been received (IOW this wouldn''t kick in until normal migration had already completed) Let''s take a look at read_exact_timed in xc_domain_restore: if ( completed ) { /* expect a heartbeat every HEARBEAT_MS ms maximum */ tv.tv_sec = HEARTBEAT_MS / 1000; tv.tv_usec = (HEARTBEAT_MS % 1000) * 1000; FD_ZERO(&rfds); FD_SET(fd, &rfds); len = select(fd + 1, &rfds, NULL, NULL, &tv); if ( !FD_ISSET(fd, &rfds) ) { fprintf(stderr, "read_exact_timed failed (select returned %zd)\n", len); return -1; } } ''completed'' is not set until the first entire checkpoint (i.e., the entirety of non-Remus migration) has completed. So, no issue. I see no evidence that Remus has anything to do with the live migration performance regression discussed in this thread, and I haven''t seen any other reported issues either. I think the mlock issue is a much more likely candidate. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-03 15:18 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On 03/06/2010 16:03, "Brendan Cully" <brendan@cs.ubc.ca> wrote:> I see no evidence that Remus has anything to do with the live > migration performance regression discussed in this thread, and I > haven''t seen any other reported issues either. I think the mlock issue > is a much more likely candidate.I agree it''s probably lack of batching plus expensive mlocks. The performance difference between different machines under test is either because one runs out of 2MB superpage extents before the other (for some reason) or because mlock operations are for some reason much more likely to take a slow path in the kernel (possibly including disk i/o) for some reason. We need to get batching back, and Edwin is on the case for that: I hope Andreas will try out Edwin''s patch to work towards that. We can also reduce mlock cost by mlocking some domain_restore arrays across the entire restore operation, I should imagine. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jun-03 17:15 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"):> The sender closes the fd, as it always has. xc_domain_restore has > always consumed the entire contents of the fd, because the qemu tail > has no length header under normal migration. There''s no behavioral > difference here that I can see.No, that is not the case. Look for example at "save" in XendCheckpoint.py in xend, where the save code: 1. Converts the domain config to sxp and writes it to the fd 2. Calls xc_save (which calls xc_domain_save) 3. Writes the qemu save file to the fd> I have no objection to a more explicit interface. The current form is > simply Remus trying to be as invisible as possible to the rest of the > tool stack.My complaint is that that is not currently the case.> 1. reads are only supposed to be able to time out after the entire > first checkpoint has been received (IOW this wouldn''t kick in until > normal migration had already completed)OMG I hadn''t noticed that you had introduced a static variable for that; I had assumed that "read_exact_timed" was roughly what it said on the tin. I think I shall stop now before I become more rude. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks! I have already downloaded them. Will build it and deployed it Akshay ________________________________ From: Jonathan Ludlam [mailto:Jonathan.Ludlam@eu.citrix.com] Sent: Thursday, June 03, 2010 3:25 AM To: AkshayKumar Mehta Cc: xen-devel@lists.xensource.com; Pradeep Padala Subject: Re: [Xen-devel] XCP There aren''t any binaries at the moment, but you could build your own from the public repositories - I sent a script out on the 25th March that will compile everything up. You''ll need to replace the xapi and fe executables (xapi is output in xen-api.hg/ocaml/xapi/ and fe is found in xen-api-libs.hg/forking_executioner/) I''ll reattach the script to this mail. Cheers, Jon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2010-Jun-03 17:29 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
On Thursday, 03 June 2010 at 18:15, Ian Jackson wrote:> Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"): > > The sender closes the fd, as it always has. xc_domain_restore has > > always consumed the entire contents of the fd, because the qemu tail > > has no length header under normal migration. There''s no behavioral > > difference here that I can see. > > No, that is not the case. Look for example at "save" in > XendCheckpoint.py in xend, where the save code: > 1. Converts the domain config to sxp and writes it to the fd > 2. Calls xc_save (which calls xc_domain_save) > 3. Writes the qemu save file to the fd4. (in XendDomain) closed the fd. Again, this is the _sender_. I fail to see your point.> > I have no objection to a more explicit interface. The current form is > > simply Remus trying to be as invisible as possible to the rest of the > > tool stack. > > My complaint is that that is not currently the case. > > > 1. reads are only supposed to be able to time out after the entire > > first checkpoint has been received (IOW this wouldn''t kick in until > > normal migration had already completed) > > OMG I hadn''t noticed that you had introduced a static variable for > that; I had assumed that "read_exact_timed" was roughly what it said > on the tin. > > I think I shall stop now before I become more rude.Feel free to reply if you have an actual Remus-caused regression instead of FUD based on misreading the code. I''d certainly be interested in fixing something real. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jun-03 18:02 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"):> On Thursday, 03 June 2010 at 18:15, Ian Jackson wrote: > > No, that is not the case. Look for example at "save" in > > XendCheckpoint.py in xend, where the save code: > > 1. Converts the domain config to sxp and writes it to the fd > > 2. Calls xc_save (which calls xc_domain_save) > > 3. Writes the qemu save file to the fd > > 4. (in XendDomain) closed the fd. Again, this is the _sender_. I fail > to see your point.In the receiver this corresponds to the qemu savefile being read from the fd, after xc_domain_restore has returned. So the fd remains readable after xc_domain_restore and the save image data sent by xc_domain_save and received by xc_domain_restore is self-delimiting. In Xen 3.4 this is easily seen in XendCheckpoint.py, where the corresponding receive logic is clearly visible. In Xen 4.x this is different because of the Remus patches. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-09 13:32 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Edwin, Dave, The issue is clearly that xc_domain_restore now only ever issues populate_physmap requests for a single extent at a time. This might be okay when allocating superpages, but that is rarely the case for PV guests (depends on a rare domain config parameter) and is not guaranteed for HVM guests either. The resulting performance is unacceptable, especially when the kernel''s underlying mlock() is slow. It looks to me like the root cause is Dave McCracken''s patch xen-unstable:19639, which Edwin Zhai''s patch xen-unstable:20126 merely builds upon. Ultimately I don''t care who fixes it, but I would like a fix for 4.0.1 which releases in the next few weeks, and if I have to do it myself I will simply hack out the above two changesets. I''d rather have domain restore working in reasonable time than the relatively small performance boost of guest superpage mappings. Thanks, Keir On 03/06/2010 09:58, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:> I assume this is PV domU rather than HVM, right? > > 1. we need check if super page is the culprit by SP_check1.patch. > > 2. if this can fix this problem, we need further check where the extra > costs comes: the speculative algorithm, or the super page population > hypercall by SP_check2.patch > > If SP_check2.patch works, the culprit is the new allocation hypercall(so > guest creation also suffer); Else, the speculative algorithm. > > Does it make sense? > > Thanks, > edwin > > > Brendan Cully wrote: >> On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote: >> >>> On 03/06/2010 02:04, "Brendan Cully" <Brendan@cs.ubc.ca> wrote: >>> >>> >>>> I''ve done a bit of profiling of the restore code and observed the >>>> slowness here too. It looks to me like it''s probably related to >>>> superpage changes. The big hit appears to be at the front of the >>>> restore process during calls to allocate_mfn_list, under the >>>> normal_page case. It looks like we''re calling >>>> xc_domain_memory_populate_physmap once per page here, instead of >>>> batching the allocation? I haven''t had time to investigate further >>>> today, but I think this is the culprit. >>>> >>> Ccing Edwin Zhai. He wrote the superpage logic for domain restore. >>> >> >> Here''s some data on the slowdown going from 2.6.18 to pvops dom0: >> >> I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable >> to measure the time to do the allocation. >> >> kernel, min call time, max call time >> 2.6.18, 4 us, 72 us >> pvops, 202 us, 10696 us (!) >> >> It looks like pvops is dramatically slower to perform the >> xc_domain_memory_populate_physmap call! >> >> I''ll attach the patch and raw data below. >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jun-10 09:27 UTC
Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops
Andreas, You can check whether this is fixed by the latest fixes in http://xenbits.xensource.com/xen-4.0-testing.hg. You should only need to rebuild and reinstall tools/libxc. Thanks, Keir On 02/06/2010 23:59, "Andreas Olsowski" <andreas.olsowski@uni.leuphana.de> wrote:> I did some further research now and shut down all virtual machines on > xenturio1, after that i got (3 runs): > (xm save takes ~5 seconds , user and sys are always negligible so i > removed those to reduce text) > > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 0m25.349s 0m27.456s 0m27.208s > > So the fact that there were running machines did impact performance of > xc_restore. > > I proceeded to start create 20 "dummy" vms with 1gig ram and 4vpcus > each (dom0 has 4096M fixed, 24gig total available): > xenturio1:~# for i in {1..20} ; do echo creating dummy$i ; xt vm create > dummy$i -vlan 27 -mem 1024 -cpus 4 ; done > creating dummy1 > vm/create> successfully created vm ''dummy1'' > .... > creating dummy20 > vm/create> successfully created vm ''dummy20'' > > and started them > for i in {1..20} ; do echo starting dummy$i ; xm start dummy$i ; done > > So my memory allocation should now be 100% (4gig dom0 20gig domUs), but > why did i have 512megs to spare for "saverestore-x1"? Oh well, onwards. > > Once again i ran a save/restore, 3 times to be sure (edited the > additional results in output). > > With 20 running vms: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 1m16.375s 0m31.306s 1m10.214s > > With 16 running vms: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 1m49.741s 1m38.696s 0m55.615s > > With 12 running vms: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 1m3.101s 2m4.254s 1m27.193s > > With 8 running vms: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 0m36.867s 0m43.513s 0m33.199s > > With 4 running vms: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 0m40.454s 0m44.929s 1m7.215s > > Keep in mind, those dumUs dont do anything at all, they just idle. > What is going on there the results seem completely random, running more > domUs can be faster than running less? How is that even possible? > > So i deleted the dummyXs and started the productive domUs again, in 3 > steps to take further measurements: > > > after first batch: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 0m23.968s 1m22.133s 1m24.420s > > after second batch: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 1m54.310s 1m11.340s 1m47.643s > > after third batch: > xenturio1:~# time xm restore /var/saverestore-x1.mem > real 1m52.065s 1m34.517s 2m8.644s 1m25.473s 1m35.943s 1m45.074s > 1m48.407s 1m18.277s 1m18.931s 1m27.458s > > So my current guess is, xc_restore speed depends on the amount of used > memory or rather how much is beeing grabbed by running processes. Does > that make any sense? > > But if that is so, explain: > I started 3 vms running "stress" that do: > load average: 30.94, 30.04, 21.00 > Mem: 5909844k total, 4020480k used, 1889364k free, 288k buffers > > But still: > tarballerina:~# time xm restore /var/saverestore-t.mem > real 0m38.654s > > Why doesnt xc_restore slow down on tarballerina, no matter what i do? > Again: all 3 machines have 24gigs ram, 2x quad xeons and dom0 is fixed > to 4096M ram. > all use the same xen4 sources, the same kernels with the same configs. > > Is the Xeon E5520 with DDR3 really this much faster than the L5335 and > L5410 with DDR2? > > If someone were to tell me, that this is expected behaviour i wouldnt > like it, but at least i could accept it. > Are machines doing plenty of cpu and memory utilizaton not a good > measurement in this or any case? > > I think tomorrow night i will migrate all machines from xenturio1 to > tarballerina, but first i have to verify that all vlans are available, > that i cannot do right now. > > --- > > Andreas > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Jon, I am facing some issues with XCP .5 : 1. Slave machine frequently hangs and on reboot looses NICs information . Running ifconfig show the interfaces and can also ping the network. 2. It fails to migrate VMs. ( though I get a Success in status of the returned object) Question : If I point a fresh installation of XCP master and slaves( new pool ) to a repository that is pre-existing , how can I get the old VMs from the vhd files in new setup? ( This is the only pool utilizing the store repository ) regards Akshay ________________________________ From: AkshayKumar Mehta Sent: Thursday, June 03, 2010 10:21 AM To: ''Jonathan Ludlam'' Cc: xen-devel@lists.xensource.com; Pradeep Padala Subject: RE: [Xen-devel] XCP Thanks! I have already downloaded them. Will build it and deployed it Akshay ________________________________ From: Jonathan Ludlam [mailto:Jonathan.Ludlam@eu.citrix.com] Sent: Thursday, June 03, 2010 3:25 AM To: AkshayKumar Mehta Cc: xen-devel@lists.xensource.com; Pradeep Padala Subject: Re: [Xen-devel] XCP There aren''t any binaries at the moment, but you could build your own from the public repositories - I sent a script out on the 25th March that will compile everything up. You''ll need to replace the xapi and fe executables (xapi is output in xen-api.hg/ocaml/xapi/ and fe is found in xen-api-libs.hg/forking_executioner/) I''ll reattach the script to this mail. Cheers, Jon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Reasonably Related Threads
- Problem with restore/migration with Xen 4.0.0 and Jeremy kernel (2.6.32.12)
- Problem with restore/migration with Xen 4.0.0 and Jeremy kernel (2.6.32.12)
- PV domain save/restore break
- remus failure -xen 4.0.1: xc_domain_restore cannot pin page tables
- HVM Live Migration Troubles - Xen 3.3.1