thr3ads.net - Xen devel - [Xen-devel] Improving domU restore time [May 2010]

If this information is useful, please help other people find it:
Share via:

Rafal Wojtczuk

2010-May-25 10:35 UTC

[Xen-devel] Improving domU restore time

Hello,
I would be grateful for the comments on possible methods to improve domain
restore performance. Focusing on the PV case, if it matters.
1) xen-4.0.0
I see a similar problem to the one reported at the thread at
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html

Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. 
[user@qubes ~]$ xm create /dev/null
	kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 
	root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400
...wait a second...
[user@qubes ~]$ xm save null nullsave
[user@qubes ~]$ time cat nullsave >/dev/null
...
[user@qubes ~]$ time cat nullsave >/dev/null
...
[user@qubes ~]$ time cat nullsave >/dev/null
real    0m0.173s
user    0m0.010s
sys     0m0.164s
/* sits nicely in the cache, let''s restore... */
[user@qubes ~]$ time xm restore nullsave
real    0m9.189s
user    0m0.151s
sys     0m0.039s

According to systemtap, xc_restore uses 3812s of CPU time; besides it being
a lot, what uses the remaining 6s ? Just as reported previously, there are 
some errors in xend.log

[2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
_static_max=0x19000000, _static_min=0x0, 
[2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
/usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0
[2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore
start: p2m_size = 19000
[2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages:
0%
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
Error when reading batch size
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
error when buffering batch, finishing
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) 
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100%
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0
pages)
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint
load
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be
built.
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0

Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the
log), with the same dom0 pvops kernel.

Ok, so there is some issue here. Some more generic thoughts below.

2) xen-3.4.3
Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like
for i in /dev/loop* ; do
	losetup $i
so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, 
especially if your system comes with maxloops=255 :). So,
let''s replace it with the xen-4.0.0 version, where this problem is
fixed (it
uses losetup -a, hurray).
Then, restore time for a 400MB domain, with the restore file in the cache,
with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time.
According to systemtap, the CPU time requirements are
xend threads- 0.363s
udevd(in dom0) - 0.007s
/etc/xen/scripts/block and its children - 1.075s
xc_restore - 1.368s
/etc/xen/scripts/vif-bridge (in netvm) - 0.130s

The obvious idea to improve /etc/xen/scripts/block shell script execution time 
is to recode it, in some other language that will not spawn hundreds of 
processes to do its job.

Now, xc_restore.
a) Is it correct that when xc_restore runs, the target domain memory is already
zeroed (because hypervisor scrubs free memory, before it is assigned to a
new domain) ? So, xc_save could check whether a given page contains only
zeroes and if so, omit it in the savefile. This could result in quite
significant savings when
- we save a freshly booted domain, or if we can zero out free memory in the 
  domain before saving
- we plan to restore multiple times from the same savefile (yes, vbd must be
restored in this case too).

b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
read syscall per page. Make it read in larger chunks. It looks it is fixed in
xen-4.0.0, is this correct ?

Also, it looks really excessive that basically copying 400MB of memory takes 
over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything 
else ?
I am aware that in the usual cases, xc_restore is not the bottleneck 
(savefile reads from the disk or the network is), but in case we can fetch 
savefile quickly, it matters.

Is 3.4.3 branch still being developed, or pure maintenance mode only, so new 
code should be prepared for 4.0.0 ? 

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Joanna Rutkowska

2010-May-25 10:58 UTC

head link

Re: [Xen-devel] Improving domU restore time

A bit of a background to the Rafal''s post -- we plan to implement a
feature that we call "Disposable VMs" in Qubes, that would essentially
allow for super-fast creation of small, one-purpose VM (DomU), e.g. just
for opening of a PDF, or Word document, etc. The point is: the creation
& resume of such a VM must be really fast, i.e. much below 1s.

And this seems possible, especially if we use sparse files for storing
the VM''s save-image and the restore operation (the VMs we''re
talking
about here would have around 100-150MB of the actual data recorded in a
sparse savefile). But, as Rafal pointed out, some operations that Xen
does seem to be implemented ineffectively, and wanted to get your
opinion before we start optimizing them (i.e. xc_restore and
/etc/xen/scripts/block optimization that Rafal mentioned).

Thanks,
j.

On 05/25/2010 12:35 PM, Rafal Wojtczuk wrote:> Hello,
> I would be grateful for the comments on possible methods to improve domain
> restore performance. Focusing on the PV case, if it matters.
> 1) xen-4.0.0
> I see a similar problem to the one reported at the thread at
> http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html
> 
> Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. 
> [user@qubes ~]$ xm create /dev/null
> 	kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 
> 	root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400
> ...wait a second...
> [user@qubes ~]$ xm save null nullsave
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> real    0m0.173s
> user    0m0.010s
> sys     0m0.164s
> /* sits nicely in the cache, let''s restore... */
> [user@qubes ~]$ time xm restore nullsave
> real    0m9.189s
> user    0m0.151s
> sys     0m0.039s
> 
> According to systemtap, xc_restore uses 3812s of CPU time; besides it being
> a lot, what uses the remaining 6s ? Just as reported previously, there are 
> some errors in xend.log
> 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
> _static_max=0x19000000, _static_min=0x0, 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
> /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore
> start: p2m_size = 19000
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory
pages:
> 0%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> Error when reading batch size
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> error when buffering batch, finishing
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) 
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0
> pages)
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint
> load
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be
> built.
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0
> 
> Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in
the
> log), with the same dom0 pvops kernel.
> 
> Ok, so there is some issue here. Some more generic thoughts below.
> 
> 2) xen-3.4.3
> Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like
> for i in /dev/loop* ; do
> 	losetup $i
> so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, 
> especially if your system comes with maxloops=255 :). So,
> let''s replace it with the xen-4.0.0 version, where this problem is
fixed (it
> uses losetup -a, hurray).
> Then, restore time for a 400MB domain, with the restore file in the cache,
> with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time.
> According to systemtap, the CPU time requirements are
> xend threads- 0.363s
> udevd(in dom0) - 0.007s
> /etc/xen/scripts/block and its children - 1.075s
> xc_restore - 1.368s
> /etc/xen/scripts/vif-bridge (in netvm) - 0.130s
> 
> The obvious idea to improve /etc/xen/scripts/block shell script execution
time
> is to recode it, in some other language that will not spawn hundreds of 
> processes to do its job.
> 
> Now, xc_restore.
> a) Is it correct that when xc_restore runs, the target domain memory is
already
> zeroed (because hypervisor scrubs free memory, before it is assigned to a
> new domain) ? So, xc_save could check whether a given page contains only
> zeroes and if so, omit it in the savefile. This could result in quite
> significant savings when
> - we save a freshly booted domain, or if we can zero out free memory in the
>   domain before saving
> - we plan to restore multiple times from the same savefile (yes, vbd must
be
> restored in this case too).
> 
> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> read syscall per page. Make it read in larger chunks. It looks it is fixed
in
> xen-4.0.0, is this correct ?
> 
> Also, it looks really excessive that basically copying 400MB of memory
takes
> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything 
> else ?
> I am aware that in the usual cases, xc_restore is not the bottleneck 
> (savefile reads from the disk or the network is), but in case we can fetch 
> savefile quickly, it matters.
> 
> Is 3.4.3 branch still being developed, or pure maintenance mode only, so
new
> code should be prepared for 4.0.0 ? 
> 
> Regards,
> Rafal Wojtczuk
> Principal Researcher
> Invisible Things Lab, Qubes-os project
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 11:50 UTC

head link

Re: [Xen-devel] Improving domU restore time

On 25/05/2010 11:35, "Rafal Wojtczuk"
<rafal@invisiblethingslab.com> wrote:
> a) Is it correct that when xc_restore runs, the target domain memory is
> already
> zeroed (because hypervisor scrubs free memory, before it is assigned to a
> new domain)
There is no guarantee that the memory will be zeroed.
> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> read syscall per page. Make it read in larger chunks. It looks it is fixed
in
> xen-4.0.0, is this correct ?
It got changed a lot for Remus. I expect performance was on their mind.
Normally kernel''s file readahead heuristic would get back most of the
performance of not reading in larger chunks.
> Also, it looks really excessive that basically copying 400MB of memory
takes
> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything
> else ?
I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
that loop.

 -- Keir
> I am aware that in the usual cases, xc_restore is not the bottleneck
> (savefile reads from the disk or the network is), but in case we can fetch
> savefile quickly, it matters.
> 
> Is 3.4.3 branch still being developed, or pure maintenance mode only, so
new
> code should be prepared for 4.0.0 ? 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rafal Wojtczuk

2010-May-25 12:50 UTC

head link

Re: [Xen-devel] Improving domU restore time

On Tue, May 25, 2010 at 12:50:40PM +0100, Keir Fraser
wrote:> On 25/05/2010 11:35, "Rafal Wojtczuk"
<rafal@invisiblethingslab.com> wrote:
> 
> > a) Is it correct that when xc_restore runs, the target domain memory
is
> > already
> > zeroed (because hypervisor scrubs free memory, before it is assigned
to a
> > new domain)
> 
> There is no guarantee that the memory will be zeroed.Interesting.
For my education, could you explain who is responsible for clearing memory
of a newborn domain ? Xend ? Could you point me to the relevant code
fragments ?
It looks sensible to clear free memory in hypervisor context in its idle 
cycles; if non-temporal instructions (movnti) were used for this, it would 
not pollute caches, and it must be done anyway ?
> > b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so,
one
> > read syscall per page. Make it read in larger chunks. It looks it is
fixed in
> > xen-4.0.0, is this correct ?
> 
> It got changed a lot for Remus. I expect performance was on their mind.
> Normally kernel''s file readahead heuristic would get back most of
the
> performance of not reading in larger chunks.Yes, readahead would keep the disk request queue full, but I was just
thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)
[user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
[user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s

RW

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 12:59 UTC

head link

Re: [Xen-devel] Improving domU restore time

On 25/05/2010 13:50, "Rafal Wojtczuk"
<rafal@invisiblethingslab.com> wrote:
>> There is no guarantee that the memory will be zeroed.
> Interesting.
> For my education, could you explain who is responsible for clearing memory
> of a newborn domain ? Xend ? Could you point me to the relevant code
> fragments ?
New domains are not guaranteed to receive zeroed memory. The only guarantee
Xen provides is that when it frees memory for a *dead* domain, it will scrub
the contents before reallocation (it may not write zeroes however, in a
debug build of Xen for example!). Other memory pages the domain freeing the
pages must scrub them itself before freeing them back to Xen.
> It looks sensible to clear free memory in hypervisor context in its idle
> cycles; if non-temporal instructions (movnti) were used for this, it would
> not pollute caches, and it must be done anyway ?
Only for that one case (freeing pages of a dead domain). In that one case we
currently do it synchronously. But that is because it was better than my
previous crappy asynchronous scrubbing code. :-)
>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions -
so, one
>>> read syscall per page. Make it read in larger chunks. It looks it
is fixed
>>> in
>>> xen-4.0.0, is this correct ?
>> 
>> It got changed a lot for Remus. I expect performance was on their mind.
>> Normally kernel''s file readahead heuristic would get back most
of the
>> performance of not reading in larger chunks.
> Yes, readahead would keep the disk request queue full, but I was just
> thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)
Well the code looks like it batches now anyway. If it isn''t, it would
be
interesting to see if making batches would measurably improve performance.

 -- Keir
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 13:02 UTC

head link

Re: [Xen-devel] Improving domU restore time

On 25/05/2010 13:50, "Rafal Wojtczuk"
<rafal@invisiblethingslab.com> wrote:
>> There is no guarantee that the memory will be zeroed.
> Interesting.
> For my education, could you explain who is responsible for clearing memory
> of a newborn domain ? Xend ? Could you point me to the relevant code
> fragments ?
New domains are not guaranteed to receive zeroed memory. The only guarantee
Xen provides is that when it frees memory for a *dead* domain, it will scrub
the contents before reallocation (it may not write zeroes however, in a
debug build of Xen for example!). Other memory pages the domain freeing the
pages must scrub them itself before freeing them back to Xen.
> It looks sensible to clear free memory in hypervisor context in its idle
> cycles; if non-temporal instructions (movnti) were used for this, it would
> not pollute caches, and it must be done anyway ?
Only for that one case (freeing pages of a dead domain). In that one case we
currently do it synchronously. But that is because it was better than my
previous crappy asynchronous scrubbing code. :-)
>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions -
so, one
>>> read syscall per page. Make it read in larger chunks. It looks it
is fixed
>>> in
>>> xen-4.0.0, is this correct ?
>> 
>> It got changed a lot for Remus. I expect performance was on their mind.
>> Normally kernel''s file readahead heuristic would get back most
of the
>> performance of not reading in larger chunks.
> Yes, readahead would keep the disk request queue full, but I was just
> thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)
Well the code looks like it batches now anyway. If it isn''t, it would
be
interesting to see if making batches would measurably improve performance.

 -- Keir
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2010-May-25 13:33 UTC

head link

RE: [Xen-devel] scrubbing free''d pages

> Other memory pages the domain freeing the
> pages must scrub them itself before freeing them back to Xen.
Is that true for a HVM domain making a decrease_reservation hypercall?
If so I should modify my code accordingly... it also means I need to
know if the page I''m decreasing is an unpopulated PoD page or not too.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 13:39 UTC

head link

Re: [Xen-devel] scrubbing free''d pages

On 25/05/2010 14:33, "James Harper"
<james.harper@bendigoit.com.au> wrote:
>> Other memory pages the domain freeing the
>> pages must scrub them itself before freeing them back to Xen.
> 
> Is that true for a HVM domain making a decrease_reservation hypercall?
> If so I should modify my code accordingly...
Yes you should.
> it also means I need to
> know if the page I''m decreasing is an unpopulated PoD page or not
too.
Certainly you could avoid it in that case. Actually I think the PoD code can
detect and reclaim allocated-but-zeroed pages however. But not sure if you
really have to rely on that or not.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Paul Durrant

2010-May-25 13:48 UTC

head link

RE: [Xen-devel] scrubbing free''d pages

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: 25 May 2010 14:40
> To: James Harper; Rafal Wojtczuk
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] scrubbing free''d pages
> 
> On 25/05/2010 14:33, "James Harper"
<james.harper@bendigoit.com.au>
> wrote:
> 
> >> Other memory pages the domain freeing the
> >> pages must scrub them itself before freeing them back to Xen.
> >
> > Is that true for a HVM domain making a decrease_reservation
> hypercall?
> > If so I should modify my code accordingly...
> 
> Yes you should.
> 
> > it also means I need to
> > know if the page I''m decreasing is an unpopulated PoD page or
not
> too.
> 
> Certainly you could avoid it in that case. Actually I think the PoD
> code can
> detect and reclaim allocated-but-zeroed pages however. But not sure
> if you
> really have to rely on that or not.
> 
Yes, that''s true, but it would be better if we didn''t have to
scrub pages and cause a populate immediately before an invalidate.

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Joanna Rutkowska

2010-May-25 14:12 UTC

head link

[Xen-devel] scrubbing pages on vm pause

On 05/25/2010 02:59 PM, Keir Fraser wrote:> On 25/05/2010 13:50, "Rafal Wojtczuk"
<rafal@invisiblethingslab.com> wrote:
> 
>>> There is no guarantee that the memory will be zeroed.
>> Interesting.
>> For my education, could you explain who is responsible for clearing
memory
>> of a newborn domain ? Xend ? Could you point me to the relevant code
>> fragments ?
> 
> New domains are not guaranteed to receive zeroed memory. The only guarantee
> Xen provides is that when it frees memory for a *dead* domain, it will
scrub
> the contents before reallocation (it may not write zeroes however, in a
> debug build of Xen for example!). Other memory pages the domain freeing the
> pages must scrub them itself before freeing them back to Xen.
> 
And what happens when we pause and save a domain? Are the pages zero-out
by xen in that case?

joanna.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 14:13 UTC

head link

[Xen-devel] Re: scrubbing pages on vm pause

On 25/05/2010 15:12, "Joanna Rutkowska"
<joanna@invisiblethingslab.com>
wrote:
>> New domains are not guaranteed to receive zeroed memory. The only
guarantee
>> Xen provides is that when it frees memory for a *dead* domain, it will
scrub
>> the contents before reallocation (it may not write zeroes however, in a
>> debug build of Xen for example!). Other memory pages the domain freeing
the
>> pages must scrub them itself before freeing them back to Xen.
>> 
> 
> And what happens when we pause and save a domain? Are the pages zero-out
> by xen in that case?
If the original domain is subsequently destroyed then yes, Xen zeroes the
pages.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Joanna Rutkowska

2010-May-25 14:19 UTC

head link

[Xen-devel] Re: scrubbing pages on vm pause

On 05/25/2010 04:13 PM, Keir Fraser wrote:> On 25/05/2010 15:12, "Joanna Rutkowska"
<joanna@invisiblethingslab.com>
> wrote:
> 
>>> New domains are not guaranteed to receive zeroed memory. The only
guarantee
>>> Xen provides is that when it frees memory for a *dead* domain, it
will scrub
>>> the contents before reallocation (it may not write zeroes however,
in a
>>> debug build of Xen for example!). Other memory pages the domain
freeing the
>>> pages must scrub them itself before freeing them back to Xen.
>>>
>>
>> And what happens when we pause and save a domain? Are the pages
zero-out
>> by xen in that case?
> 
> If the original domain is subsequently destroyed then yes, Xen zeroes the
> pages.
> 
Let''s consider this scenario:

xm save domain1

xm create domain2

Can the domain2 get *unscrubbed* pages that were previously used by
domain1, but were not scrubbed properly by domain1?

j.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-May-25 14:19 UTC

head link

[Xen-devel] Re: scrubbing pages on vm pause

On 25/05/2010 15:19, "Joanna Rutkowska"
<joanna@invisiblethingslab.com>
wrote:
> Let''s consider this scenario:
> 
> xm save domain1
> 
> xm create domain2
> 
> Can the domain2 get *unscrubbed* pages that were previously used by
> domain1, but were not scrubbed properly by domain1?
Generally speaking a domain loses pages to the free pool in only two ways:
via a decrease_reservation hypercall, and via domain destruction. In the
former case the domain itself is responsible for first scrubbing the page.
In the latter case Xen is responsible. With both avenues covered, domain2
cannot get unscrubbed pages from domain1.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Joanna Rutkowska

2010-May-25 14:24 UTC

head link

[Xen-devel] Re: scrubbing pages on vm pause

On 05/25/2010 04:19 PM, Keir Fraser wrote:> On 25/05/2010 15:19, "Joanna Rutkowska"
<joanna@invisiblethingslab.com>
> wrote:
> 
>> Let''s consider this scenario:
>>
>> xm save domain1
>>
>> xm create domain2
>>
>> Can the domain2 get *unscrubbed* pages that were previously used by
>> domain1, but were not scrubbed properly by domain1?
> 
> Generally speaking a domain loses pages to the free pool in only two ways:
> via a decrease_reservation hypercall, and via domain destruction. In the
> former case the domain itself is responsible for first scrubbing the page.
> In the latter case Xen is responsible. With both avenues covered, domain2
> cannot get unscrubbed pages from domain1.
> Makes sense.

Thanks,
j.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rafal Wojtczuk

2010-May-31 09:42 UTC

head link

[Xen-devel] Re: Improving domU restore time

Hello,> I would be grateful for the comments on possible methods to improve domain
> restore performance. Focusing on the PV case, if it matters.Continuing the topic; thank you to everyone that responded so far.

Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. 
Let me just reiterate that for our purposes, the domain save time (and 
possible related post-processing) is not critical, it 
is only the restore time that matters. I did some experiments; they involve:
1) before saving a domain, have domU allocate all free memory in an userland
process, then fill it with some MAGIC_PATTERN. Save domU, then process the
savefile, removing all pfns (and their page content) that refer to a page 
containing MAGIC_PATTERN.
This reduces the savefile size.
2) instead of executing "xm restore savefile", just poke the xmlrpc
request
to Xend unix socket via socat
3) change the /etc/xen/scripts/block so that in the "add file:" case,
it calls
only 3 processes (xenstore-read, losetup, xenstore-write); assuming the
sharing check can be done elsewhere, this should provide realistic lower
bound for the execution time

For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, 
this cuts down the restore real time from 2700 ms to 1153 ms. Some questions:
a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via 
xc_domain_memory_populate_physmap() and then calls 
xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on
the pfn/mfn pairs. If we remove some pfns from the savefile, this will not
happen. Instead, the mfn for the removed pfn (referring to memory whose
content we don''t care for) will be allocated in
uncanonicalize_pagetable(),
because there will be a pte entry for this page. But uncanonicalize_pagetable()
does not call xc_add_mmu_update(). Still, the domain seems to be restored 
properly (naturally the buffer filled previously with MAGIC_PATTERN now 
contains junk, but this is the whole purpose of it).
Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above
scenario ? It basically does
set_gpfn_from_mfn(mfn, gpfn)
but this should already be taken care for by 
xc_domain_memory_populate_physmap() ?

b) There still seems to be some discrepancy between the real time (1153ms) and
the CPU time (970ms); considering this is a machine with 2 cores (and at
least the hotplug scripts execute in parallel), it is notable. What can cause 
the involved processes to sleep (we read the savefile from fs cache, so there 
should be no disk reads at all). Is the single threaded nature of xenstored 
the possible cause for the delays ?
Generally xenstored seems to be quite busy during the restore. Do you think
some of the queries (from Xend?) are redundant ? Is there anything else
that can be removed from the relevant Xend code with no harm ? This question
may sound too blunt; but given the fact that "xm restore savefile"
wastes 220
ms of CPU time doing apparently nothing useful, I would assume there is some
overhead in Xend too. 
The systemtap trace in the attachment; it does not contain a line about the 
xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate 
any thread. 

c) >> Also, it looks really excessive that basically copying 400MB of memory
takes
>> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
> that loop.Let''s imagine there is a hypercall
do_direct_memcpy_from_dom0_to_mfn(int
mfn_count, mfn* mfn_array, char * pages_content).
Would it make xc_restore faster if instead of using the xc_map_foreign_batch()
interface, it would call the above hypercall ? On x86_64 all the physical
memory is already mapped in the hypervisor (is this correct?), so this could 
be quicker, as no page table setup would be necessary ?

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Jun-01 17:00 UTC

head link

Re: [Xen-devel] Re: Improving domU restore time

On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote:> Hello,
>   
>> I would be grateful for the comments on possible methods to improve
domain
>> restore performance. Focusing on the PV case, if it matters.
>>     
> Continuing the topic; thank you to everyone that responded so far.
>
> Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. 
> Let me just reiterate that for our purposes, the domain save time (and 
> possible related post-processing) is not critical, it 
> is only the restore time that matters. I did some experiments; they
involve:
> 1) before saving a domain, have domU allocate all free memory in an
userland
> process, then fill it with some MAGIC_PATTERN. Save domU, then process the
> savefile, removing all pfns (and their page content) that refer to a page 
> containing MAGIC_PATTERN.
> This reduces the savefile size.
>   
Why not just balloon the domain down?
> 2) instead of executing "xm restore savefile", just poke the
xmlrpc request
> to Xend unix socket via socat
>   
I would seek alternatives to the xend/xm toolset.  I''ve been doing my
bit to make libxenlight/xl useful, though it still needs a lot of work
to get it to anything remotely production-ready...
> 3) change the /etc/xen/scripts/block so that in the "add file:"
case, it calls
> only 3 processes (xenstore-read, losetup, xenstore-write); assuming the
> sharing check can be done elsewhere, this should provide realistic lower
> bound for the execution time
>
> For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, 
> this cuts down the restore real time from 2700 ms to 1153 ms. Some
questions:
> a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via
> xc_domain_memory_populate_physmap() and then calls 
> xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on
> the pfn/mfn pairs. If we remove some pfns from the savefile, this will not
> happen. Instead, the mfn for the removed pfn (referring to memory whose
> content we don''t care for) will be allocated in
uncanonicalize_pagetable(),
> because there will be a pte entry for this page. But
uncanonicalize_pagetable()
> does not call xc_add_mmu_update(). Still, the domain seems to be restored 
> properly (naturally the buffer filled previously with MAGIC_PATTERN now 
> contains junk, but this is the whole purpose of it).
> Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above
> scenario ? It basically does
> set_gpfn_from_mfn(mfn, gpfn)
> but this should already be taken care for by 
> xc_domain_memory_populate_physmap() ?
>
> b) There still seems to be some discrepancy between the real time (1153ms)
and
> the CPU time (970ms); considering this is a machine with 2 cores (and at
> least the hotplug scripts execute in parallel), it is notable. What can
cause
> the involved processes to sleep (we read the savefile from fs cache, so
there
> should be no disk reads at all). Is the single threaded nature of xenstored
> the possible cause for the delays ?
>   
Have you tried oxenstored?  It works well for me, and seems to be a lot
faster.
> Generally xenstored seems to be quite busy during the restore. Do you think
> some of the queries (from Xend?) are redundant ? Is there anything else
> that can be removed from the relevant Xend code with no harm ? This
question
> may sound too blunt; but given the fact that "xm restore
savefile" wastes 220
> ms of CPU time doing apparently nothing useful, I would assume there is
some
> overhead in Xend too. 
> The systemtap trace in the attachment; it does not contain a line about the
> xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate
> any thread. 
>
> c) 
>   
>>> Also, it looks really excessive that basically copying 400MB of
memory takes
>>> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
>>>       
>> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part
of
>> that loop.
>>     
> Let''s imagine there is a hypercall
do_direct_memcpy_from_dom0_to_mfn(int
> mfn_count, mfn* mfn_array, char * pages_content).
> Would it make xc_restore faster if instead of using the
xc_map_foreign_batch()
> interface, it would call the above hypercall ? On x86_64 all the physical
> memory is already mapped in the hypervisor (is this correct?), so this
could
> be quicker, as no page table setup would be necessary ?
>   
The main cost of pagetable manipulations is the tlb flush; if you can
batch all your setups together to amortize the cost of the tlb flush, it
should be pretty quick.  But if batching is not being used properly,
then it could get very expensive.  My own observation of "strace xl
restore" is that it seems to do a *lot* of ioctls on privcmd, but I
haven''t looked more closely to see what those calls are, and whether
they''re being done in an optimal way.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rafal Wojtczuk

2010-Jun-02 16:24 UTC

head link

Re: [Xen-devel] Re: Improving domU restore time

On Tue, Jun 01, 2010 at 10:00:09AM -0700, Jeremy Fitzhardinge
wrote:> On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote:
> > Hello,
> >   
> >> I would be grateful for the comments on possible methods to
improve domain
> >> restore performance. Focusing on the PV case, if it matters.
> >>     
> > Continuing the topic; thank you to everyone that responded so far.
> >
> > Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops
x86_64.
> > Let me just reiterate that for our purposes, the domain save time (and
> > possible related post-processing) is not critical, it 
> > is only the restore time that matters. I did some experiments; they
involve:
> > 1) before saving a domain, have domU allocate all free memory in an
userland
> > process, then fill it with some MAGIC_PATTERN. Save domU, then process
the
> > savefile, removing all pfns (and their page content) that refer to a
page
> > containing MAGIC_PATTERN.
> > This reduces the savefile size.
> Why not just balloon the domain down?I thought it (well, rather the matching balloon up after restore) would cost 
quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in 90ms
range. Yes, that is much cleaner, thank you for the hint.
 > > should be no disk reads at all). Is the single threaded nature of
xenstored
> > the possible cause for the delays ?
> Have you tried oxenstored?  It works well for me, and seems to be a lot
> faster.Do you mean 
http://xenbits.xensource.com/ext/xen-ocaml-tools.hg
?
After some tweaks to Makefiles (-fPIC is required on x86_64 for libs sources) 
it compiles, but then it bails during startup with 
fatal error: exception Failure("ioctl bind_interdomain failed")
This happens under xen-3.4.3; does it require 4.0.0 ?
> >> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant
part of
> >> that loop.
> > Let''s imagine there is a hypercall
do_direct_memcpy_from_dom0_to_mfn(int
> > mfn_count, mfn* mfn_array, char * pages_content).
> The main cost of pagetable manipulations is the tlb flush; if you can
> batch all your setups together to amortize the cost of the tlb flush, it
> should be pretty quick.  But if batching is not being used properly,
> then it could get very expensive.  My own observation of "strace xl
> restore" is that it seems to do a *lot* of ioctls on privcmd, but I
> haven''t looked more closely to see what those calls are, and
whether
> they''re being done in an optimal way.Well, it looks like xc_restore should _usually_ call 
xc_map_foreign_batch once per pages batch (once per 1024 read pages), which
looks sensible. xc_add_mmu_update also tries to batch requests. There are 
432 occurences of ioctl syscall in the xc_restore strace output; I am not 
sure if it is damagingly numerous. 

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Jun-02 16:33 UTC

head link

Re: [Xen-devel] Re: Improving domU restore time

On 06/02/2010 09:24 AM, Rafal Wojtczuk wrote:>> Why not just balloon the domain down?
>>     
> I thought it (well, rather the matching balloon up after restore) would
cost
> quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in
90ms
> range. Yes, that is much cleaner, thank you for the hint.
>   
Aside from the cost of the hypercalls to actually give up the pages,
ballooning is just the same as memory allocation from the system''s
perspective.
>>> should be no disk reads at all). Is the single threaded nature of
xenstored
>>> the possible cause for the delays ?
>>>       
>> Have you tried oxenstored?  It works well for me, and seems to be a lot
>> faster.
>>     
> Do you mean 
> http://xenbits.xensource.com/ext/xen-ocaml-tools.hg
> ?
> After some tweaks to Makefiles (-fPIC is required on x86_64 for libs
sources)
> it compiles,
It builds out of the box for me on my x86-64 machine.
>  but then it bails during startup with 
> fatal error: exception Failure("ioctl bind_interdomain failed")
> This happens under xen-3.4.3; does it require 4.0.0 ?
>   
No, I don''t think so, but it does have to be the first xenstore you run
after boot.  Ah, but Xen 4 probably has oxenstored build and other fixes
which aren''t in 3.4.3.  In particular, I think it has been brought into
the main xen-unstable repo, rather than living off to the side.

But it is much quicker than the C one, I think primarily because it is
entirely memory resident.
> Well, it looks like xc_restore should _usually_ call 
> xc_map_foreign_batch once per pages batch (once per 1024 read pages), which
> looks sensible. xc_add_mmu_update also tries to batch requests. There are 
> 432 occurences of ioctl syscall in the xc_restore strace output; I am not 
> sure if it is damagingly numerous. 
>   
Time for some profiling to see where the time is going then.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2010 - Improving domU restore time

[Xen-devel] Improving domU restore time

Re: [Xen-devel] Improving domU restore time

Re: [Xen-devel] Improving domU restore time

Re: [Xen-devel] Improving domU restore time

Re: [Xen-devel] Improving domU restore time

Re: [Xen-devel] Improving domU restore time

RE: [Xen-devel] scrubbing free''d pages

Re: [Xen-devel] scrubbing free''d pages

RE: [Xen-devel] scrubbing free''d pages

[Xen-devel] scrubbing pages on vm pause

[Xen-devel] Re: scrubbing pages on vm pause

[Xen-devel] Re: scrubbing pages on vm pause

[Xen-devel] Re: scrubbing pages on vm pause

[Xen-devel] Re: scrubbing pages on vm pause

[Xen-devel] Re: Improving domU restore time

Re: [Xen-devel] Re: Improving domU restore time

Re: [Xen-devel] Re: Improving domU restore time

Re: [Xen-devel] Re: Improving domU restore time