Hello, I would be grateful for the comments on possible methods to improve domain restore performance. Focusing on the PV case, if it matters. 1) xen-4.0.0 I see a similar problem to the one reported at the thread at http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. [user@qubes ~]$ xm create /dev/null kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400 ...wait a second... [user@qubes ~]$ xm save null nullsave [user@qubes ~]$ time cat nullsave >/dev/null ... [user@qubes ~]$ time cat nullsave >/dev/null ... [user@qubes ~]$ time cat nullsave >/dev/null real 0m0.173s user 0m0.010s sys 0m0.164s /* sits nicely in the cache, let''s restore... */ [user@qubes ~]$ time xm restore nullsave real 0m9.189s user 0m0.151s sys 0m0.039s According to systemtap, xc_restore uses 3812s of CPU time; besides it being a lot, what uses the remaining 6s ? Just as reported previously, there are some errors in xend.log [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0, _static_max=0x19000000, _static_min=0x0, [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]: /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0 [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore start: p2m_size = 19000 [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages: 0% [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error: Error when reading batch size [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error: error when buffering batch, finishing [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100% [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0 pages) [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0 [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint load [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be built. [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0 Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the log), with the same dom0 pvops kernel. Ok, so there is some issue here. Some more generic thoughts below. 2) xen-3.4.3 Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like for i in /dev/loop* ; do losetup $i so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, especially if your system comes with maxloops=255 :). So, let''s replace it with the xen-4.0.0 version, where this problem is fixed (it uses losetup -a, hurray). Then, restore time for a 400MB domain, with the restore file in the cache, with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time. According to systemtap, the CPU time requirements are xend threads- 0.363s udevd(in dom0) - 0.007s /etc/xen/scripts/block and its children - 1.075s xc_restore - 1.368s /etc/xen/scripts/vif-bridge (in netvm) - 0.130s The obvious idea to improve /etc/xen/scripts/block shell script execution time is to recode it, in some other language that will not spawn hundreds of processes to do its job. Now, xc_restore. a) Is it correct that when xc_restore runs, the target domain memory is already zeroed (because hypervisor scrubs free memory, before it is assigned to a new domain) ? So, xc_save could check whether a given page contains only zeroes and if so, omit it in the savefile. This could result in quite significant savings when - we save a freshly booted domain, or if we can zero out free memory in the domain before saving - we plan to restore multiple times from the same savefile (yes, vbd must be restored in this case too). b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one read syscall per page. Make it read in larger chunks. It looks it is fixed in xen-4.0.0, is this correct ? Also, it looks really excessive that basically copying 400MB of memory takes over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything else ? I am aware that in the usual cases, xc_restore is not the bottleneck (savefile reads from the disk or the network is), but in case we can fetch savefile quickly, it matters. Is 3.4.3 branch still being developed, or pure maintenance mode only, so new code should be prepared for 4.0.0 ? Regards, Rafal Wojtczuk Principal Researcher Invisible Things Lab, Qubes-os project _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
A bit of a background to the Rafal''s post -- we plan to implement a feature that we call "Disposable VMs" in Qubes, that would essentially allow for super-fast creation of small, one-purpose VM (DomU), e.g. just for opening of a PDF, or Word document, etc. The point is: the creation & resume of such a VM must be really fast, i.e. much below 1s. And this seems possible, especially if we use sparse files for storing the VM''s save-image and the restore operation (the VMs we''re talking about here would have around 100-150MB of the actual data recorded in a sparse savefile). But, as Rafal pointed out, some operations that Xen does seem to be implemented ineffectively, and wanted to get your opinion before we start optimizing them (i.e. xc_restore and /etc/xen/scripts/block optimization that Rafal mentioned). Thanks, j. On 05/25/2010 12:35 PM, Rafal Wojtczuk wrote:> Hello, > I would be grateful for the comments on possible methods to improve domain > restore performance. Focusing on the PV case, if it matters. > 1) xen-4.0.0 > I see a similar problem to the one reported at the thread at > http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html > > Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. > [user@qubes ~]$ xm create /dev/null > kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 > root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400 > ...wait a second... > [user@qubes ~]$ xm save null nullsave > [user@qubes ~]$ time cat nullsave >/dev/null > ... > [user@qubes ~]$ time cat nullsave >/dev/null > ... > [user@qubes ~]$ time cat nullsave >/dev/null > real 0m0.173s > user 0m0.010s > sys 0m0.164s > /* sits nicely in the cache, let''s restore... */ > [user@qubes ~]$ time xm restore nullsave > real 0m9.189s > user 0m0.151s > sys 0m0.039s > > According to systemtap, xc_restore uses 3812s of CPU time; besides it being > a lot, what uses the remaining 6s ? Just as reported previously, there are > some errors in xend.log > > [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0, > _static_max=0x19000000, _static_min=0x0, > [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]: > /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0 > [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore > start: p2m_size = 19000 > [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages: > 0% > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error: > Error when reading batch size > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error: > error when buffering batch, finishing > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100% > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0 > pages) > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0 > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint > load > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be > built. > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0 > > Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the > log), with the same dom0 pvops kernel. > > Ok, so there is some issue here. Some more generic thoughts below. > > 2) xen-3.4.3 > Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like > for i in /dev/loop* ; do > losetup $i > so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, > especially if your system comes with maxloops=255 :). So, > let''s replace it with the xen-4.0.0 version, where this problem is fixed (it > uses losetup -a, hurray). > Then, restore time for a 400MB domain, with the restore file in the cache, > with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time. > According to systemtap, the CPU time requirements are > xend threads- 0.363s > udevd(in dom0) - 0.007s > /etc/xen/scripts/block and its children - 1.075s > xc_restore - 1.368s > /etc/xen/scripts/vif-bridge (in netvm) - 0.130s > > The obvious idea to improve /etc/xen/scripts/block shell script execution time > is to recode it, in some other language that will not spawn hundreds of > processes to do its job. > > Now, xc_restore. > a) Is it correct that when xc_restore runs, the target domain memory is already > zeroed (because hypervisor scrubs free memory, before it is assigned to a > new domain) ? So, xc_save could check whether a given page contains only > zeroes and if so, omit it in the savefile. This could result in quite > significant savings when > - we save a freshly booted domain, or if we can zero out free memory in the > domain before saving > - we plan to restore multiple times from the same savefile (yes, vbd must be > restored in this case too). > > b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one > read syscall per page. Make it read in larger chunks. It looks it is fixed in > xen-4.0.0, is this correct ? > > Also, it looks really excessive that basically copying 400MB of memory takes > over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its > dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything > else ? > I am aware that in the usual cases, xc_restore is not the bottleneck > (savefile reads from the disk or the network is), but in case we can fetch > savefile quickly, it matters. > > Is 3.4.3 branch still being developed, or pure maintenance mode only, so new > code should be prepared for 4.0.0 ? > > Regards, > Rafal Wojtczuk > Principal Researcher > Invisible Things Lab, Qubes-os project > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 11:35, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:> a) Is it correct that when xc_restore runs, the target domain memory is > already > zeroed (because hypervisor scrubs free memory, before it is assigned to a > new domain)There is no guarantee that the memory will be zeroed.> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one > read syscall per page. Make it read in larger chunks. It looks it is fixed in > xen-4.0.0, is this correct ?It got changed a lot for Remus. I expect performance was on their mind. Normally kernel''s file readahead heuristic would get back most of the performance of not reading in larger chunks.> Also, it looks really excessive that basically copying 400MB of memory takes > over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its > dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything > else ?I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of that loop. -- Keir> I am aware that in the usual cases, xc_restore is not the bottleneck > (savefile reads from the disk or the network is), but in case we can fetch > savefile quickly, it matters. > > Is 3.4.3 branch still being developed, or pure maintenance mode only, so new > code should be prepared for 4.0.0 ?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, May 25, 2010 at 12:50:40PM +0100, Keir Fraser wrote:> On 25/05/2010 11:35, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote: > > > a) Is it correct that when xc_restore runs, the target domain memory is > > already > > zeroed (because hypervisor scrubs free memory, before it is assigned to a > > new domain) > > There is no guarantee that the memory will be zeroed.Interesting. For my education, could you explain who is responsible for clearing memory of a newborn domain ? Xend ? Could you point me to the relevant code fragments ? It looks sensible to clear free memory in hypervisor context in its idle cycles; if non-temporal instructions (movnti) were used for this, it would not pollute caches, and it must be done anyway ?> > b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one > > read syscall per page. Make it read in larger chunks. It looks it is fixed in > > xen-4.0.0, is this correct ? > > It got changed a lot for Remus. I expect performance was on their mind. > Normally kernel''s file readahead heuristic would get back most of the > performance of not reading in larger chunks.Yes, readahead would keep the disk request queue full, but I was just thinking of lowering the syscall overhead. 1e5 syscalls is a lot :) [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400 102400+0 records in 102400+0 records out 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s RW _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:>> There is no guarantee that the memory will be zeroed. > Interesting. > For my education, could you explain who is responsible for clearing memory > of a newborn domain ? Xend ? Could you point me to the relevant code > fragments ?New domains are not guaranteed to receive zeroed memory. The only guarantee Xen provides is that when it frees memory for a *dead* domain, it will scrub the contents before reallocation (it may not write zeroes however, in a debug build of Xen for example!). Other memory pages the domain freeing the pages must scrub them itself before freeing them back to Xen.> It looks sensible to clear free memory in hypervisor context in its idle > cycles; if non-temporal instructions (movnti) were used for this, it would > not pollute caches, and it must be done anyway ?Only for that one case (freeing pages of a dead domain). In that one case we currently do it synchronously. But that is because it was better than my previous crappy asynchronous scrubbing code. :-)>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one >>> read syscall per page. Make it read in larger chunks. It looks it is fixed >>> in >>> xen-4.0.0, is this correct ? >> >> It got changed a lot for Remus. I expect performance was on their mind. >> Normally kernel''s file readahead heuristic would get back most of the >> performance of not reading in larger chunks. > Yes, readahead would keep the disk request queue full, but I was just > thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)Well the code looks like it batches now anyway. If it isn''t, it would be interesting to see if making batches would measurably improve performance. -- Keir> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400 > 102400+0 records in > 102400+0 records out > 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s > [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100 > 100+0 records in > 100+0 records out > 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:>> There is no guarantee that the memory will be zeroed. > Interesting. > For my education, could you explain who is responsible for clearing memory > of a newborn domain ? Xend ? Could you point me to the relevant code > fragments ?New domains are not guaranteed to receive zeroed memory. The only guarantee Xen provides is that when it frees memory for a *dead* domain, it will scrub the contents before reallocation (it may not write zeroes however, in a debug build of Xen for example!). Other memory pages the domain freeing the pages must scrub them itself before freeing them back to Xen.> It looks sensible to clear free memory in hypervisor context in its idle > cycles; if non-temporal instructions (movnti) were used for this, it would > not pollute caches, and it must be done anyway ?Only for that one case (freeing pages of a dead domain). In that one case we currently do it synchronously. But that is because it was better than my previous crappy asynchronous scrubbing code. :-)>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one >>> read syscall per page. Make it read in larger chunks. It looks it is fixed >>> in >>> xen-4.0.0, is this correct ? >> >> It got changed a lot for Remus. I expect performance was on their mind. >> Normally kernel''s file readahead heuristic would get back most of the >> performance of not reading in larger chunks. > Yes, readahead would keep the disk request queue full, but I was just > thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)Well the code looks like it batches now anyway. If it isn''t, it would be interesting to see if making batches would measurably improve performance. -- Keir> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400 > 102400+0 records in > 102400+0 records out > 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s > [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100 > 100+0 records in > 100+0 records out > 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Other memory pages the domain freeing the > pages must scrub them itself before freeing them back to Xen.Is that true for a HVM domain making a decrease_reservation hypercall? If so I should modify my code accordingly... it also means I need to know if the page I''m decreasing is an unpopulated PoD page or not too. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 14:33, "James Harper" <james.harper@bendigoit.com.au> wrote:>> Other memory pages the domain freeing the >> pages must scrub them itself before freeing them back to Xen. > > Is that true for a HVM domain making a decrease_reservation hypercall? > If so I should modify my code accordingly...Yes you should.> it also means I need to > know if the page I''m decreasing is an unpopulated PoD page or not too.Certainly you could avoid it in that case. Actually I think the PoD code can detect and reclaim allocated-but-zeroed pages however. But not sure if you really have to rely on that or not. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: 25 May 2010 14:40 > To: James Harper; Rafal Wojtczuk > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] scrubbing free''d pages > > On 25/05/2010 14:33, "James Harper" <james.harper@bendigoit.com.au> > wrote: > > >> Other memory pages the domain freeing the > >> pages must scrub them itself before freeing them back to Xen. > > > > Is that true for a HVM domain making a decrease_reservation > hypercall? > > If so I should modify my code accordingly... > > Yes you should. > > > it also means I need to > > know if the page I''m decreasing is an unpopulated PoD page or not > too. > > Certainly you could avoid it in that case. Actually I think the PoD > code can > detect and reclaim allocated-but-zeroed pages however. But not sure > if you > really have to rely on that or not. >Yes, that''s true, but it would be better if we didn''t have to scrub pages and cause a populate immediately before an invalidate. Paul _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 05/25/2010 02:59 PM, Keir Fraser wrote:> On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote: > >>> There is no guarantee that the memory will be zeroed. >> Interesting. >> For my education, could you explain who is responsible for clearing memory >> of a newborn domain ? Xend ? Could you point me to the relevant code >> fragments ? > > New domains are not guaranteed to receive zeroed memory. The only guarantee > Xen provides is that when it frees memory for a *dead* domain, it will scrub > the contents before reallocation (it may not write zeroes however, in a > debug build of Xen for example!). Other memory pages the domain freeing the > pages must scrub them itself before freeing them back to Xen. >And what happens when we pause and save a domain? Are the pages zero-out by xen in that case? joanna. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 15:12, "Joanna Rutkowska" <joanna@invisiblethingslab.com> wrote:>> New domains are not guaranteed to receive zeroed memory. The only guarantee >> Xen provides is that when it frees memory for a *dead* domain, it will scrub >> the contents before reallocation (it may not write zeroes however, in a >> debug build of Xen for example!). Other memory pages the domain freeing the >> pages must scrub them itself before freeing them back to Xen. >> > > And what happens when we pause and save a domain? Are the pages zero-out > by xen in that case?If the original domain is subsequently destroyed then yes, Xen zeroes the pages. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 05/25/2010 04:13 PM, Keir Fraser wrote:> On 25/05/2010 15:12, "Joanna Rutkowska" <joanna@invisiblethingslab.com> > wrote: > >>> New domains are not guaranteed to receive zeroed memory. The only guarantee >>> Xen provides is that when it frees memory for a *dead* domain, it will scrub >>> the contents before reallocation (it may not write zeroes however, in a >>> debug build of Xen for example!). Other memory pages the domain freeing the >>> pages must scrub them itself before freeing them back to Xen. >>> >> >> And what happens when we pause and save a domain? Are the pages zero-out >> by xen in that case? > > If the original domain is subsequently destroyed then yes, Xen zeroes the > pages. >Let''s consider this scenario: xm save domain1 xm create domain2 Can the domain2 get *unscrubbed* pages that were previously used by domain1, but were not scrubbed properly by domain1? j. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/05/2010 15:19, "Joanna Rutkowska" <joanna@invisiblethingslab.com> wrote:> Let''s consider this scenario: > > xm save domain1 > > xm create domain2 > > Can the domain2 get *unscrubbed* pages that were previously used by > domain1, but were not scrubbed properly by domain1?Generally speaking a domain loses pages to the free pool in only two ways: via a decrease_reservation hypercall, and via domain destruction. In the former case the domain itself is responsible for first scrubbing the page. In the latter case Xen is responsible. With both avenues covered, domain2 cannot get unscrubbed pages from domain1. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 05/25/2010 04:19 PM, Keir Fraser wrote:> On 25/05/2010 15:19, "Joanna Rutkowska" <joanna@invisiblethingslab.com> > wrote: > >> Let''s consider this scenario: >> >> xm save domain1 >> >> xm create domain2 >> >> Can the domain2 get *unscrubbed* pages that were previously used by >> domain1, but were not scrubbed properly by domain1? > > Generally speaking a domain loses pages to the free pool in only two ways: > via a decrease_reservation hypercall, and via domain destruction. In the > former case the domain itself is responsible for first scrubbing the page. > In the latter case Xen is responsible. With both avenues covered, domain2 > cannot get unscrubbed pages from domain1. >Makes sense. Thanks, j. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hello,> I would be grateful for the comments on possible methods to improve domain > restore performance. Focusing on the PV case, if it matters.Continuing the topic; thank you to everyone that responded so far. Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. Let me just reiterate that for our purposes, the domain save time (and possible related post-processing) is not critical, it is only the restore time that matters. I did some experiments; they involve: 1) before saving a domain, have domU allocate all free memory in an userland process, then fill it with some MAGIC_PATTERN. Save domU, then process the savefile, removing all pfns (and their page content) that refer to a page containing MAGIC_PATTERN. This reduces the savefile size. 2) instead of executing "xm restore savefile", just poke the xmlrpc request to Xend unix socket via socat 3) change the /etc/xen/scripts/block so that in the "add file:" case, it calls only 3 processes (xenstore-read, losetup, xenstore-write); assuming the sharing check can be done elsewhere, this should provide realistic lower bound for the execution time For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, this cuts down the restore real time from 2700 ms to 1153 ms. Some questions: a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via xc_domain_memory_populate_physmap() and then calls xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on the pfn/mfn pairs. If we remove some pfns from the savefile, this will not happen. Instead, the mfn for the removed pfn (referring to memory whose content we don''t care for) will be allocated in uncanonicalize_pagetable(), because there will be a pte entry for this page. But uncanonicalize_pagetable() does not call xc_add_mmu_update(). Still, the domain seems to be restored properly (naturally the buffer filled previously with MAGIC_PATTERN now contains junk, but this is the whole purpose of it). Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above scenario ? It basically does set_gpfn_from_mfn(mfn, gpfn) but this should already be taken care for by xc_domain_memory_populate_physmap() ? b) There still seems to be some discrepancy between the real time (1153ms) and the CPU time (970ms); considering this is a machine with 2 cores (and at least the hotplug scripts execute in parallel), it is notable. What can cause the involved processes to sleep (we read the savefile from fs cache, so there should be no disk reads at all). Is the single threaded nature of xenstored the possible cause for the delays ? Generally xenstored seems to be quite busy during the restore. Do you think some of the queries (from Xend?) are redundant ? Is there anything else that can be removed from the relevant Xend code with no harm ? This question may sound too blunt; but given the fact that "xm restore savefile" wastes 220 ms of CPU time doing apparently nothing useful, I would assume there is some overhead in Xend too. The systemtap trace in the attachment; it does not contain a line about the xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate any thread. c)>> Also, it looks really excessive that basically copying 400MB of memory takes >> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its > I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of > that loop.Let''s imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int mfn_count, mfn* mfn_array, char * pages_content). Would it make xc_restore faster if instead of using the xc_map_foreign_batch() interface, it would call the above hypercall ? On x86_64 all the physical memory is already mapped in the hypervisor (is this correct?), so this could be quicker, as no page table setup would be necessary ? Regards, Rafal Wojtczuk Principal Researcher Invisible Things Lab, Qubes-os project _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote:> Hello, > >> I would be grateful for the comments on possible methods to improve domain >> restore performance. Focusing on the PV case, if it matters. >> > Continuing the topic; thank you to everyone that responded so far. > > Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. > Let me just reiterate that for our purposes, the domain save time (and > possible related post-processing) is not critical, it > is only the restore time that matters. I did some experiments; they involve: > 1) before saving a domain, have domU allocate all free memory in an userland > process, then fill it with some MAGIC_PATTERN. Save domU, then process the > savefile, removing all pfns (and their page content) that refer to a page > containing MAGIC_PATTERN. > This reduces the savefile size. >Why not just balloon the domain down?> 2) instead of executing "xm restore savefile", just poke the xmlrpc request > to Xend unix socket via socat >I would seek alternatives to the xend/xm toolset. I''ve been doing my bit to make libxenlight/xl useful, though it still needs a lot of work to get it to anything remotely production-ready...> 3) change the /etc/xen/scripts/block so that in the "add file:" case, it calls > only 3 processes (xenstore-read, losetup, xenstore-write); assuming the > sharing check can be done elsewhere, this should provide realistic lower > bound for the execution time > > For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, > this cuts down the restore real time from 2700 ms to 1153 ms. Some questions: > a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via > xc_domain_memory_populate_physmap() and then calls > xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on > the pfn/mfn pairs. If we remove some pfns from the savefile, this will not > happen. Instead, the mfn for the removed pfn (referring to memory whose > content we don''t care for) will be allocated in uncanonicalize_pagetable(), > because there will be a pte entry for this page. But uncanonicalize_pagetable() > does not call xc_add_mmu_update(). Still, the domain seems to be restored > properly (naturally the buffer filled previously with MAGIC_PATTERN now > contains junk, but this is the whole purpose of it). > Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above > scenario ? It basically does > set_gpfn_from_mfn(mfn, gpfn) > but this should already be taken care for by > xc_domain_memory_populate_physmap() ? > > b) There still seems to be some discrepancy between the real time (1153ms) and > the CPU time (970ms); considering this is a machine with 2 cores (and at > least the hotplug scripts execute in parallel), it is notable. What can cause > the involved processes to sleep (we read the savefile from fs cache, so there > should be no disk reads at all). Is the single threaded nature of xenstored > the possible cause for the delays ? >Have you tried oxenstored? It works well for me, and seems to be a lot faster.> Generally xenstored seems to be quite busy during the restore. Do you think > some of the queries (from Xend?) are redundant ? Is there anything else > that can be removed from the relevant Xend code with no harm ? This question > may sound too blunt; but given the fact that "xm restore savefile" wastes 220 > ms of CPU time doing apparently nothing useful, I would assume there is some > overhead in Xend too. > The systemtap trace in the attachment; it does not contain a line about the > xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate > any thread. > > c) > >>> Also, it looks really excessive that basically copying 400MB of memory takes >>> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its >>> >> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of >> that loop. >> > Let''s imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int > mfn_count, mfn* mfn_array, char * pages_content). > Would it make xc_restore faster if instead of using the xc_map_foreign_batch() > interface, it would call the above hypercall ? On x86_64 all the physical > memory is already mapped in the hypervisor (is this correct?), so this could > be quicker, as no page table setup would be necessary ? >The main cost of pagetable manipulations is the tlb flush; if you can batch all your setups together to amortize the cost of the tlb flush, it should be pretty quick. But if batching is not being used properly, then it could get very expensive. My own observation of "strace xl restore" is that it seems to do a *lot* of ioctls on privcmd, but I haven''t looked more closely to see what those calls are, and whether they''re being done in an optimal way. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jun 01, 2010 at 10:00:09AM -0700, Jeremy Fitzhardinge wrote:> On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote: > > Hello, > > > >> I would be grateful for the comments on possible methods to improve domain > >> restore performance. Focusing on the PV case, if it matters. > >> > > Continuing the topic; thank you to everyone that responded so far. > > > > Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. > > Let me just reiterate that for our purposes, the domain save time (and > > possible related post-processing) is not critical, it > > is only the restore time that matters. I did some experiments; they involve: > > 1) before saving a domain, have domU allocate all free memory in an userland > > process, then fill it with some MAGIC_PATTERN. Save domU, then process the > > savefile, removing all pfns (and their page content) that refer to a page > > containing MAGIC_PATTERN. > > This reduces the savefile size. > Why not just balloon the domain down?I thought it (well, rather the matching balloon up after restore) would cost quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in 90ms range. Yes, that is much cleaner, thank you for the hint.> > should be no disk reads at all). Is the single threaded nature of xenstored > > the possible cause for the delays ? > Have you tried oxenstored? It works well for me, and seems to be a lot > faster.Do you mean http://xenbits.xensource.com/ext/xen-ocaml-tools.hg ? After some tweaks to Makefiles (-fPIC is required on x86_64 for libs sources) it compiles, but then it bails during startup with fatal error: exception Failure("ioctl bind_interdomain failed") This happens under xen-3.4.3; does it require 4.0.0 ?> >> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of > >> that loop. > > Let''s imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int > > mfn_count, mfn* mfn_array, char * pages_content). > The main cost of pagetable manipulations is the tlb flush; if you can > batch all your setups together to amortize the cost of the tlb flush, it > should be pretty quick. But if batching is not being used properly, > then it could get very expensive. My own observation of "strace xl > restore" is that it seems to do a *lot* of ioctls on privcmd, but I > haven''t looked more closely to see what those calls are, and whether > they''re being done in an optimal way.Well, it looks like xc_restore should _usually_ call xc_map_foreign_batch once per pages batch (once per 1024 read pages), which looks sensible. xc_add_mmu_update also tries to batch requests. There are 432 occurences of ioctl syscall in the xc_restore strace output; I am not sure if it is damagingly numerous. Regards, Rafal Wojtczuk Principal Researcher Invisible Things Lab, Qubes-os project _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 06/02/2010 09:24 AM, Rafal Wojtczuk wrote:>> Why not just balloon the domain down? >> > I thought it (well, rather the matching balloon up after restore) would cost > quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in 90ms > range. Yes, that is much cleaner, thank you for the hint. >Aside from the cost of the hypercalls to actually give up the pages, ballooning is just the same as memory allocation from the system''s perspective.>>> should be no disk reads at all). Is the single threaded nature of xenstored >>> the possible cause for the delays ? >>> >> Have you tried oxenstored? It works well for me, and seems to be a lot >> faster. >> > Do you mean > http://xenbits.xensource.com/ext/xen-ocaml-tools.hg > ? > After some tweaks to Makefiles (-fPIC is required on x86_64 for libs sources) > it compiles,It builds out of the box for me on my x86-64 machine.> but then it bails during startup with > fatal error: exception Failure("ioctl bind_interdomain failed") > This happens under xen-3.4.3; does it require 4.0.0 ? >No, I don''t think so, but it does have to be the first xenstore you run after boot. Ah, but Xen 4 probably has oxenstored build and other fixes which aren''t in 3.4.3. In particular, I think it has been brought into the main xen-unstable repo, rather than living off to the side. But it is much quicker than the C one, I think primarily because it is entirely memory resident.> Well, it looks like xc_restore should _usually_ call > xc_map_foreign_batch once per pages batch (once per 1024 read pages), which > looks sensible. xc_add_mmu_update also tries to batch requests. There are > 432 occurences of ioctl syscall in the xc_restore strace output; I am not > sure if it is damagingly numerous. >Time for some profiling to see where the time is going then. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel