thr3ads.net - Xen devel - [Xen-devel] Error restoring DomU when using GPLPV [Aug 2009]

If this information is useful, please help other people find it:
Share via:

James Harper

2009-Aug-04 01:22 UTC

[Xen-devel] Error restoring DomU when using GPLPV

A user (Joshua) is reporting that ''xm restore'' isn''t
working when GPLPV
is involved. I''ve checked the logs generated by GPLPV and there are no
problems on the save side of things that I can see. Is there anything
extra that the suspend or restore needs to do since 3.4.x? 

Joshua has captured the following:

On the dom0 I initiated a "xm save" of the VM.  No problems here, but
when I initiate an "xm restore", I receive the following error:

Error: /usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1 failed

And in /var/log/xen/xend.log, I see (pertaining to this event):

[2009-08-02 15:12:44 4839] INFO (image:745) Need to create platform
device.[domid:103]
[2009-08-02 15:12:44 4839] DEBUG (XendCheckpoint:261)
restore:shadow=0x9, _static_max=0x40000000, _static_min=0x0,
[2009-08-02 15:12:44 4839] DEBUG (balloon:166) Balloon: 31589116 KiB
free; need 1061888; done.
[2009-08-02 15:12:44 4839] DEBUG (XendCheckpoint:278) [xc_restore]:
/usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1
[2009-08-02 15:12:44 4839] INFO (XendCheckpoint:417) xc_domain_restore
start: p2m_size = 100000
[2009-08-02 15:12:44 4839] INFO (XendCheckpoint:417) Reloading memory
pages:   0%
[2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) Failed allocation
for dom 103: 1024 extents of order 0
[2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) ERROR Internal
error: Failed to allocate memory for batch.!
[2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417)
[2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) Restore exit with
rc=1
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2724)
XendDomainInfo.destroy: domid=103
[2009-08-02 15:12:52 4839] ERROR (XendDomainInfo:2738)
XendDomainInfo.destroy: domain destruction failed.
Traceback (most recent call last):
  File
"/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 2731, in destroy
    xc.domain_pause(self.domid)
Error: (3, ''No such process'')
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2204) No device model
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2206) Releasing devices
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing vbd/768
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing vfb/0
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing
console/0
[2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2009-08-02 15:12:52 4839] ERROR (XendDomain:1149) Restore failed
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py",
line
1147, in domain_restore_fd
    return XendCheckpoint.restore(self, fd, paused=paused,
relocating=relocating)
  File
"/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
line 282, in restore
    forkHelper(cmd, fd, handler.handler, True)
  File
"/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",
line 405, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1 failed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 01:41 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

It seems that somewhere along the line Xen started using an event
channel to trigger a suspend, as opposed to the ''shutdown''
xenstore
value. Is there anything else there I need to know about?

Thanks

James
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of James Harper
> Sent: Tuesday, 4 August 2009 11:23
> To: xen-devel@lists.xensource.com
> Cc: Joshua West
> Subject: [Xen-devel] Error restoring DomU when using GPLPV
> 
> A user (Joshua) is reporting that ''xm restore''
isn''t working when
GPLPV> is involved. I''ve checked the logs generated by GPLPV and there
are no
> problems on the save side of things that I can see. Is there anything
> extra that the suspend or restore needs to do since 3.4.x?
> 
> Joshua has captured the following:
> 
> On the dom0 I initiated a "xm save" of the VM.  No problems here,
but
> when I initiate an "xm restore", I receive the following error:
> 
> Error: /usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1 failed
> 
> And in /var/log/xen/xend.log, I see (pertaining to this event):
> 
> [2009-08-02 15:12:44 4839] INFO (image:745) Need to create platform
> device.[domid:103]
> [2009-08-02 15:12:44 4839] DEBUG (XendCheckpoint:261)
> restore:shadow=0x9, _static_max=0x40000000, _static_min=0x0,
> [2009-08-02 15:12:44 4839] DEBUG (balloon:166) Balloon: 31589116 KiB
> free; need 1061888; done.
> [2009-08-02 15:12:44 4839] DEBUG (XendCheckpoint:278) [xc_restore]:
> /usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1
> [2009-08-02 15:12:44 4839] INFO (XendCheckpoint:417) xc_domain_restore
> start: p2m_size = 100000
> [2009-08-02 15:12:44 4839] INFO (XendCheckpoint:417) Reloading memory
> pages:   0%
> [2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) Failed allocation
> for dom 103: 1024 extents of order 0
> [2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) ERROR Internal
> error: Failed to allocate memory for batch.!
> [2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417)
> [2009-08-02 15:12:52 4839] INFO (XendCheckpoint:417) Restore exit with
> rc=1
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2724)
> XendDomainInfo.destroy: domid=103
> [2009-08-02 15:12:52 4839] ERROR (XendDomainInfo:2738)
> XendDomainInfo.destroy: domain destruction failed.
> Traceback (most recent call last):
>   File
"/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",> line 2731, in destroy
>     xc.domain_pause(self.domid)
> Error: (3, ''No such process'')
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2204) No device model
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2206) Releasing
devices> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing
vbd/768> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
> XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing vfb/0
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
> XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:2219) Removing
> console/0
> [2009-08-02 15:12:52 4839] DEBUG (XendDomainInfo:1134)
> XendDomainInfo.destroyDevice: deviceClass = console, device console/0
> [2009-08-02 15:12:52 4839] ERROR (XendDomain:1149) Restore failed
> Traceback (most recent call last):
>   File
"/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py",
line> 1147, in domain_restore_fd
>     return XendCheckpoint.restore(self, fd, paused=paused,
> relocating=relocating)
>   File
"/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",> line 282, in restore
>     forkHelper(cmd, fd, handler.handler, True)
>   File
"/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py",> line 405, in forkHelper
>     raise XendError("%s failed" % string.join(cmd))
> XendError: /usr/lib64/xen/bin/xc_restore 56 103 2 3 1 1 1 failed
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 05:30 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> It seems that somewhere along the line Xen started using an event
> channel to trigger a suspend, as opposed to the
''shutdown'' xenstore
> value. Is there anything else there I need to know about?
> 
Actually that seems to be unrelated the problem (I found this out after
adding suspend evtchn support to gplpv...)

The actual error is that the call to xc_memory_op passes 33 as
nr_extents, but the return value is 32, which is an error condition. Is
it not counting an already allocated page in the PVonHVM case or
something?

XendCheckpoint:417 calls xc_domain_restore which calls
xc_domain_memory_increase_reservation.

Does XENMEM_increase_reservation mean "increase reservation _by_ X" or
"increase reservation _to_ X"?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 06:10 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> >
> > It seems that somewhere along the line Xen started using an event
> > channel to trigger a suspend, as opposed to the
''shutdown'' xenstore
> > value. Is there anything else there I need to know about?
> >
> 
> Actually that seems to be unrelated the problem (I found this out
after adding> suspend evtchn support to gplpv...)
> 
> The actual error is that the call to xc_memory_op passes 33 as
nr_extents, but> the return value is 32, which is an error condition. Is it not
counting an> already allocated page in the PVonHVM case or something?
> 
> XendCheckpoint:417 calls xc_domain_restore which calls
> xc_domain_memory_increase_reservation.
> 
> Does XENMEM_increase_reservation mean "increase reservation _by_
X" or
> "increase reservation _to_ X"?
> 
Actually I was looking at the wrong thing. The error is actually in
xc_domain_memory_populate_physmap

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 07:58 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

When the DomU is running, ''xm debug q'' looks like:

(XEN) General information for domain 23:
(XEN)     refcnt=3 dying=0 nr_pages=197611 xenheap_pages=33
dirty_cpus={1} max_pages=197632

During restore, it looks like this:
(XEN) General information for domain 22:
(XEN)     refcnt=3 dying=0 nr_pages=196576 xenheap_pages=5 dirty_cpus={}
max_pages=197632

(last capture before the domain ran out of pages)

I added some debugging to libxc and it allocates bunches of 1024 pages
over and over quite successfully but then fails at 33. Presumably
something is counting pages incorrectly somewhere?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-04 08:21 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 04/08/2009 08:58, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> When the DomU is running, ''xm debug q'' looks like:
> 
> (XEN) General information for domain 23:
> (XEN)     refcnt=3 dying=0 nr_pages=197611 xenheap_pages=33
> dirty_cpus={1} max_pages=197632
> 
> During restore, it looks like this:
> (XEN) General information for domain 22:
> (XEN)     refcnt=3 dying=0 nr_pages=196576 xenheap_pages=5 dirty_cpus={}
> max_pages=197632
Is the host simply out of memory? If dom22 above has 196576 pages and
max_pages=197632 then an allocation of 33 order-0 extents should not fail
due to over-commitment to the guest. The only reason for such a failure is
inadequate memory available in the host free pools. Perhaps xend
auto-ballooning is involved? I''d turn it off if so, as it blows. It
could
have freed up one-page-too-few or somesuch.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 09:01 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> On 04/08/2009 08:58, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> > When the DomU is running, ''xm debug q'' looks like:
> >
> > (XEN) General information for domain 23:
> > (XEN)     refcnt=3 dying=0 nr_pages=197611 xenheap_pages=33
> > dirty_cpus={1} max_pages=197632
> >
> > During restore, it looks like this:
> > (XEN) General information for domain 22:
> > (XEN)     refcnt=3 dying=0 nr_pages=196576 xenheap_pages=5
dirty_cpus={}> > max_pages=197632
> 
> Is the host simply out of memory?
No. 5G physical memory free and there is only 768MB assigned to the
DomU. I can start the guest again, I just can''t restore it.
> If dom22 above has 196576 pages and
> max_pages=197632 then an allocation of 33 order-0 extents should not
fail> due to over-commitment to the guest.
196576 is just where it happened to be when I took the last ''xm debug
q'', before ''xm restore'' failed and deleted it. The
allocation of ''33''
returns ''32'' so it does appear to be an off-by-one error.
> The only reason for such a failure is
> inadequate memory available in the host free pools. Perhaps xend
> auto-ballooning is involved? I''d turn it off if so, as it blows.
It
could> have freed up one-page-too-few or somesuch.
I assume that what happens is that the memory continues to grow until it
hits max_pages, for some reason.  Is there a way to tell ''xm
restore''
not to delete the domain when the restore fails so I can see if nr_pages
really does equal max_pages at the time that it dies?

The curious thing is that this only happens when GPLPV is running. A PV
domU or a pure HVM DomU doesn''t have this problem (presumably that
would
have been noticed during regression testing). It would be interesting to
try a PVonHVM Linux DomU and see how that goes... hopefully someone who
having the problem with GPLPV also has PVonHVM domains they could test.

James



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 09:26 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> Is the host simply out of memory? If dom22 above has 196576 pages and
> max_pages=197632 then an allocation of 33 order-0 extents should not
fail> due to over-commitment to the guest.
I added some more debugging...

batch 1024 [1]
Allocating 1024 mfns [2]
197600 allocated [3]
batch 1024
Allocating 33 mfns
Failed allocation for dom 24: 33 extents of order 0 (err = 32) [4]

[1] is just after ''j'' is read in xc_domain_restore
[2] is just before the call to populate_physmap
[3] is just after the call to populate_physmap
[4] is the error message in the memory_op function in libxc modified to
give the value of err

According to the a total of 197632 are being allocated and the last page
cannot be (could be more pages required the next time around the loop
too...)

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-04 09:27 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 04/08/2009 10:01, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> I assume that what happens is that the memory continues to grow until it
> hits max_pages, for some reason.  Is there a way to tell ''xm
restore''
> not to delete the domain when the restore fails so I can see if nr_pages
> really does equal max_pages at the time that it dies?
> 
> The curious thing is that this only happens when GPLPV is running. A PV
> domU or a pure HVM DomU doesn''t have this problem (presumably that
would
> have been noticed during regression testing). It would be interesting to
> try a PVonHVM Linux DomU and see how that goes... hopefully someone who
> having the problem with GPLPV also has PVonHVM domains they could test.
Okay, also this is a normal save/restore (no live migration of pages)?

Could the grant-table/shinfo Xenheap pages be confusing matters I wonder.
The save process may save those pages out - since dom0 can map them it will
also save them - and then they get mistakenly restored as domheap pages at
the far end. All would work out okay in the end when you remap those special
pages during GPLPV restore as the domheap pages would get implciitly freed.
But maybe there is not allocation headroom for the guest in the meantime so
the restore fails.

Just a theory. Maybe you could try unmapping grant/shinfo pages in the
suspend callback? This may not help for live migration though, where pages
get transmitted before the callback. It may be necessary to allow dom0 to
specify ''map me a page but not if it''s special'' and
plumb that up to
xc_domain_save. It''d be good to have the theory proved first.

 Cheers,
 Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 09:34 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> Just a theory. Maybe you could try unmapping grant/shinfo pages in the
> suspend callback? This may not help for live migration though, where
pages> get transmitted before the callback. It may be necessary to allow dom0
to> specify ''map me a page but not if it''s special''
and plumb that up to
> xc_domain_save. It''d be good to have the theory proved first.
> 
Can do. What is the opposite of ''XENMEM_add_to_physmap'' which
I assume
I''ll need to unmap the grant pages? Is it XENMEM_decrease_reservation?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-04 10:28 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 04/08/2009 10:34, "James Harper"
<james.harper@bendigoit.com.au> wrote:
>> Just a theory. Maybe you could try unmapping grant/shinfo pages in the
>> suspend callback? This may not help for live migration though, where
> pages
>> get transmitted before the callback. It may be necessary to allow dom0
> to
>> specify ''map me a page but not if it''s
special'' and plumb that up to
>> xc_domain_save. It''d be good to have the theory proved first.
>> 
> 
> Can do. What is the opposite of ''XENMEM_add_to_physmap''
which I assume
> I''ll need to unmap the grant pages? Is it
XENMEM_decrease_reservation?
Oh yes, there is no direct opposite of add_to_physmap... But I think
decrease_reservation will work okay in this case, fortunately.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 10:39 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> Could the grant-table/shinfo Xenheap pages be confusing matters I
wonder.> The save process may save those pages out - since dom0 can map them it
will> also save them - and then they get mistakenly restored as domheap
pages at> the far end. All would work out okay in the end when you remap those
special> pages during GPLPV restore as the domheap pages would get implciitly
freed.> But maybe there is not allocation headroom for the guest in the
meantime so> the restore fails.
> 
> Just a theory. Maybe you could try unmapping grant/shinfo pages in the
> suspend callback? This may not help for live migration though, where
pages> get transmitted before the callback. It may be necessary to allow dom0
to> specify ''map me a page but not if it''s special''
and plumb that up to
> xc_domain_save. It''d be good to have the theory proved first.
> 
I took the easier path and told my grant code to map 2 less pages, and
sure enough it tries to allocate 31 pages (which succeeds) but then
tries to allocate 7 (which fails). So the 31 (was 33) must contain the
grant table pages etc. I''ll attempt to add the unmap code you requested
and see if it makes a difference...

So why doesn''t PV have this problem? Does it not send the pages first?
And do you think that a Linux HVM domain with PV drivers would suffer
the same fate?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 10:40 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> On 04/08/2009 10:34, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> >> Just a theory. Maybe you could try unmapping grant/shinfo pages in
the> >> suspend callback? This may not help for live migration though,
where> > pages
> >> get transmitted before the callback. It may be necessary to allow
dom0> > to
> >> specify ''map me a page but not if it''s
special'' and plumb that up
to> >> xc_domain_save. It''d be good to have the theory proved
first.
> >>
> >
> > Can do. What is the opposite of
''XENMEM_add_to_physmap'' which I
assume> > I''ll need to unmap the grant pages? Is it
XENMEM_decrease_reservation?> 
> Oh yes, there is no direct opposite of add_to_physmap... But I think
> decrease_reservation will work okay in this case, fortunately.
> 
Given that I''m not going to use the grant table subsequent to unmapping
them I''ll probably get away with it, but does
XENMEM_decrease_reservation actually tell xen that the pages are no
longer actually part of the grant table?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-04 11:02 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 04/08/2009 11:40, "James Harper"
<james.harper@bendigoit.com.au> wrote:
>> Oh yes, there is no direct opposite of add_to_physmap... But I think
>> decrease_reservation will work okay in this case, fortunately.
>> 
> 
> Given that I''m not going to use the grant table subsequent to
unmapping
> them I''ll probably get away with it, but does
> XENMEM_decrease_reservation actually tell xen that the pages are no
> longer actually part of the grant table?
No, for a xenheap page the page won''t actually get freed. Xen keeps a
reference to them until the domain is finally destroyed.

Regarding the Linux PV-on-HVM drivers - they may have the same issue. Full
PV guests do not as they have a gnttab_suspend() function called during
suspend callback (and for subtle reasons xc_domain_save can detect and not
save Xenheap pages for a full PV guests anyway - because it can see the P2M
table in that case).

Like I said before -- unmapping the gnttab pages I think will not help you
for live migration, but I suppose it is a reasonable thing to do anyway. For
live migration I think xc_domain_save needs t get a bit smarter about
Xenheap pages in HVM guests.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-04 11:34 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> On 04/08/2009 11:40, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> >> Oh yes, there is no direct opposite of add_to_physmap... But I
think> >> decrease_reservation will work okay in this case, fortunately.
> >>
> >
> > Given that I''m not going to use the grant table subsequent to
unmapping> > them I''ll probably get away with it, but does
> > XENMEM_decrease_reservation actually tell xen that the pages are no
> > longer actually part of the grant table?
> 
> No, for a xenheap page the page won''t actually get freed. Xen
keeps a
> reference to them until the domain is finally destroyed.
> 
> Regarding the Linux PV-on-HVM drivers - they may have the same issue.
Full> PV guests do not as they have a gnttab_suspend() function called
during> suspend callback (and for subtle reasons xc_domain_save can detect and
not> save Xenheap pages for a full PV guests anyway - because it can see
the P2M> table in that case).
> 
> Like I said before -- unmapping the gnttab pages I think will not help
you> for live migration, but I suppose it is a reasonable thing to do
anyway. For> live migration I think xc_domain_save needs t get a bit smarter about
> Xenheap pages in HVM guests.
> 
Understood. Do you have any idea about why it worked fine under 3.3.x
but not 3.4.x?

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-04 13:12 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 04/08/2009 12:34, "James Harper"
<james.harper@bendigoit.com.au> wrote:
>> Like I said before -- unmapping the gnttab pages I think will not help
> you
>> for live migration, but I suppose it is a reasonable thing to do
> anyway. For
>> live migration I think xc_domain_save needs t get a bit smarter about
>> Xenheap pages in HVM guests.
> 
> Understood. Do you have any idea about why it worked fine under 3.3.x
> but not 3.4.x?
The bit of code in 3.3''s xc_domain_save.c that is commented "Skip
PFNs that
aren''t really there" is removed in 3.4. That will be the reason.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Aug-18 08:17 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On Tue, Aug 04, 2009 at 02:12:48PM +0100, Keir Fraser
wrote:> On 04/08/2009 12:34, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> 
> >> Like I said before -- unmapping the gnttab pages I think will not
help
> > you
> >> for live migration, but I suppose it is a reasonable thing to do
> > anyway. For
> >> live migration I think xc_domain_save needs t get a bit smarter
about
> >> Xenheap pages in HVM guests.
> > 
> > Understood. Do you have any idea about why it worked fine under 3.3.x
> > but not 3.4.x?
> 
> The bit of code in 3.3''s xc_domain_save.c that is commented
"Skip PFNs that
> aren''t really there" is removed in 3.4. That will be the
reason.
> 
James: Did you figure out how to fix this problem in gplpv drivers? Just
asking because it seems people hit this save/restore/migration problem pretty
often..

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-18 09:33 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> > The bit of code in 3.3''s xc_domain_save.c that is commented
"Skip PFNs that
> > aren''t really there" is removed in 3.4. That will be the
reason.
> >
> 
> James: Did you figure out how to fix this problem in gplpv drivers? Just
> asking because it seems people hit this save/restore/migration problem
pretty
> often..
> 
It''s a problem that affects any PVonHVM domain afaict, so I''d
rather defer to the person who made the change originally.

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-19 07:39 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi James,>
> It''s a problem that affects any PVonHVM domain afaict, so
I''d rather defer to the person who made the change originally.I did some test. Linux PVM live migration works well on Xen3.4. As you 
said above, PVHVM fails live migration and got error: "Error: 
/usr/lib/xen/bin/xc_save 32 8 0 0 5 failed", but save/restore is OK.
So this problem should be fixed on xen side instead of pv driver side?

Following is the log of linux PVHVM live migration:

[2009-08-19 10:52:08 2832] DEBUG (XendCheckpoint:110) [xc_save]: 
/usr/lib/xen/bin/xc_save 32 8 0 0 5
[2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) xc_save: failed to 
get the suspend evtchn port
[2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418)
[2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) Saving memory 
pages: iter 1   0%ERROR Internal error: Error when writing to state file 
(4a) (errno 104)
[2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) Save exit rc=1
[2009-08-19 10:52:08 2832] ERROR (XendCheckpoint:164) Save failed on 
domain OVM_EL5U3_X86_PVHVM_4GB (8) - resuming.
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", 
line 132, in save
    forkHelper(cmd, fd, saveInputHandler, False)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", 
line 406, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_save 32 8 0 0 5 failed
[2009-08-19 10:52:08 2832] DEBUG (XendDomainInfo:2806) 
XendDomainInfo.resumeDomain(8)


Thanks
Annie.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-19 07:52 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 19/08/2009 08:39, "ANNIE LI" <annie.li@oracle.com> wrote:
> Following is the log of linux PVHVM live migration:
> 
> [2009-08-19 10:52:08 2832] DEBUG (XendCheckpoint:110) [xc_save]:
> /usr/lib/xen/bin/xc_save 32 8 0 0 5
> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) xc_save: failed to get
> the suspend evtchn port
> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418)
> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) Saving memory pages:
iter
> 1   0%ERROR Internal error: Error when writing to state file (4a) (errno
104)
The original error will be on the receive side. The sender failed because
the receiver closed down.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-20 03:21 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi

Keir Fraser wrote:> On 19/08/2009 08:39, "ANNIE LI" <annie.li@oracle.com>
wrote:
>
>   
>> Following is the log of linux PVHVM live migration:
>>
>> [2009-08-19 10:52:08 2832] DEBUG (XendCheckpoint:110) [xc_save]:
>> /usr/lib/xen/bin/xc_save 32 8 0 0 5
>> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) xc_save: failed to
get
>> the suspend evtchn port
>> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418)
>> [2009-08-19 10:52:08 2832] INFO (XendCheckpoint:418) Saving memory
pages: iter
>> 1   0%ERROR Internal error: Error when writing to state file (4a)
(errno 104)
>>     
>
> The original error will be on the receive side. The sender failed because
> the receiver closed down.
>   Please ignore my last post, it was wrong because of the kernel version 
problem.
I tested again after updating my test environment. Live migration/ save/ 
restore works well on pvhvm with Xen3.4. So this should not be problem 
for any PVonHVM domain, just windows domain with pv drivers has this 
problem.

Thanks
Annie.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-20 08:17 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi> Regarding the Linux PV-on-HVM drivers - they may have the same issue. Full
> PV guests do not as they have a gnttab_suspend() function called during
> suspend callback (and for subtle reasons xc_domain_save can detect and not
> save Xenheap pages for a full PV guests anyway - because it can see the P2M
> table in that case).
>   Live migration of linux PVonHVM passed on  Xen3.4 successfully, but 
windows os with pv driver failed.> Like I said before -- unmapping the gnttab pages I think will not help you
> for live migration, but I suppose it is a reasonable thing to do anyway.
For
> live migration I think xc_domain_save needs t get a bit smarter about
> Xenheap pages in HVM guests.Yes. The live migration of windows domu with pv driver failed again 
after we added unmapping gnttab and shinfo pages in suspending process. 
The save process is OK, but the restore hit the similar problem. So what 
should we do in windows pv driver to avoid this problem? Any suggestion 
for this issue?

Any suggestion is appreciated.

Following is the log of restore process:

[2009-08-21 00:17:34 2918] DEBUG (XendCheckpoint:278) [xc_restore]: 
/usr/lib/xen/bin/xc_restore 31 33 2 3 1 1 1
[2009-08-21 00:17:34 2918] INFO (XendCheckpoint:418) xc_domain_restore 
start: p2m_size = 100000
[2009-08-21 00:17:34 2918] INFO (XendCheckpoint:418) Reloading memory 
pages:   0%
[2009-08-21 00:17:44 2918] INFO (XendCheckpoint:418) Failed allocation 
for dom 33: 7 extents of order 0
[2009-08-21 00:17:44 2918] INFO (XendCheckpoint:418) ERROR Internal 
error: Failed to allocate memory for batch.!
[2009-08-21 00:17:44 2918] INFO (XendCheckpoint:418)
[2009-08-21 00:17:44 2918] INFO (XendCheckpoint:418) Restore exit with rc=1
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2750) 
XendDomainInfo.destroy: domid=33
[2009-08-21 00:17:44 2918] ERROR (XendDomainInfo:2764) 
XendDomainInfo.destroy: domain destruction failed.
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
line 2757, in destroy
    xc.domain_pause(self.domid)
Error: (3, ''No such process'')
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2225) No device model
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2227) Releasing devices
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2240) Removing vif/0
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:1142) 
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2240) Removing vbd/768
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:1142) 
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2240) Removing vfb/0
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:1142) 
XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:2240) Removing console/0
[2009-08-21 00:17:44 2918] DEBUG (XendDomainInfo:1142) 
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2009-08-21 00:17:45 2918] ERROR (XendDomain:1149) Restore failed
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomain.py", line
1147, in domain_restore_fd
    return XendCheckpoint.restore(self, fd, paused=paused, 
relocating=relocating)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", 
line 282, in restore
    forkHelper(cmd, fd, handler.handler, True)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", 
line 406, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 31 33 2 3 1 1 1 failed


Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-20 08:27 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 20/08/2009 09:17, "ANNIE LI" <annie.li@oracle.com> wrote:
>> Like I said before -- unmapping the gnttab pages I think will not help
you
>> for live migration, but I suppose it is a reasonable thing to do
anyway. For
>> live migration I think xc_domain_save needs t get a bit smarter about
>> Xenheap pages in HVM guests.
> Yes. The live migration of windows domu with pv driver failed again
> after we added unmapping gnttab and shinfo pages in suspending process.
> The save process is OK, but the restore hit the similar problem. So what
> should we do in windows pv driver to avoid this problem? Any suggestion
> for this issue?
Balloon down by (#gnttab+#shinfo) pages when PV drivers first load. You
could do this instead of unmapping gnttab+shinfo pages on suspend, or as
well as.

Ultimately the ''right'' fix will need to be implemented in Xen
and dom0
tools. But the above kludge would work perfectly well I believe.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Aug-20 09:42 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

> 
> On 20/08/2009 09:17, "ANNIE LI" <annie.li@oracle.com>
wrote:
> 
> >> Like I said before -- unmapping the gnttab pages I think will not
help you> >> for live migration, but I suppose it is a reasonable thing to do
anyway.> For
> >> live migration I think xc_domain_save needs t get a bit smarter
about> >> Xenheap pages in HVM guests.
> > Yes. The live migration of windows domu with pv driver failed again
> > after we added unmapping gnttab and shinfo pages in suspending
process.> > The save process is OK, but the restore hit the similar problem. So
what> > should we do in windows pv driver to avoid this problem? Any
suggestion> > for this issue?
> 
> Balloon down by (#gnttab+#shinfo) pages when PV drivers first load.
You> could do this instead of unmapping gnttab+shinfo pages on suspend, or
as> well as.
> 
> Ultimately the ''right'' fix will need to be implemented in
Xen and dom0
> tools. But the above kludge would work perfectly well I believe.
> 
Kludgy, as you say, but if it works then so be it.

Is there a future proof way for the drivers to know if they are running
under a future version of xen that isn''t broken in this way?
I''m
guessing not...

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-20 10:05 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi> Kludgy, as you say, but if it works then so be it.
>   Just finished test,  live migration works well on winpv domu after 
ballooning down those pages when pv driver first load.

Another question:
Why linux PVonHVM don''t have this issue? does the linux pv driver have 
the same process like above?

Thanks
Annie.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-20 10:19 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 20/08/2009 10:42, "James Harper"
<james.harper@bendigoit.com.au> wrote:
>> Ultimately the ''right'' fix will need to be
implemented in Xen and dom0
>> tools. But the above kludge would work perfectly well I believe.
>> 
> 
> Kludgy, as you say, but if it works then so be it.
> 
> Is there a future proof way for the drivers to know if they are running
> under a future version of xen that isn''t broken in this way?
I''m
> guessing not...
No, not really. We could add a xenstore flag or something I suppose. But
really losing a few memory pages is not the end of the world. I suppose the
major pain might be if it shatters a physical superpage, and hence makes
VT-d/EPT type stuff more expensive. If you could at least arrange for the
pages to come from the same aligned 2MB region, or even from the bottom 2MB
of memory (which can never be allocated as a superpage because of the VGA
area), that might be nice.

Thinking about how to fix this nicely in the tools, it seems pretty tricky
if I don''t want to have to change the dom0 kernel too. The kernel is
quite
involved in mapping foreign pages up to user space and gets in the way of
hacking in a flag between tools and Xen... And we do want to be able to map
foreign Xen-heap pages in some cases. It''s only a nuisance for
xc_domain_save.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-20 10:20 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 20/08/2009 11:05, "ANNIE LI" <annie.li@oracle.com> wrote:
> Hi
>>  
>> Kludgy, as you say, but if it works then so be it.
>>   
> Just finished test,  live migration works well on winpv domu after
ballooning
> down those pages when pv driver first load.
> 
> Another question:
> Why linux PVonHVM don''t have this issue? does the linux pv driver
have the
> same process like above?
It''s working more by luck than design I fear.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-20 10:41 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 20/08/2009 11:19, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
> No, not really. We could add a xenstore flag or something I suppose. But
> really losing a few memory pages is not the end of the world. I suppose the
> major pain might be if it shatters a physical superpage, and hence makes
> VT-d/EPT type stuff more expensive. If you could at least arrange for the
> pages to come from the same aligned 2MB region, or even from the bottom 2MB
> of memory (which can never be allocated as a superpage because of the VGA
> area), that might be nice.
> 
> Thinking about how to fix this nicely in the tools, it seems pretty tricky
> if I don''t want to have to change the dom0 kernel too. The kernel
is quite
> involved in mapping foreign pages up to user space and gets in the way of
> hacking in a flag between tools and Xen... And we do want to be able to map
> foreign Xen-heap pages in some cases. It''s only a nuisance for
> xc_domain_save.
Another method would be for the PV-on-HVM drivers to map shinfo and gnttab
pages in a restricted guest-physical address range and then advertise the
range via, say, a new HVMPARAM. Tools could also indicate to PV-on-HVM that
this method is supported via the same HVMPARAM.

How does that sound? It would give the ability for a general exclusion range
for save/restore.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-20 11:55 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi> Just finished test,  live migration works well on winpv domu after 
> ballooning down those pages when pv driver first load.After more test, i find there are some problem with this method: 
ballooning down those pages when pv driver first load.

If i balloon down those pages when driver first load, save/restore can 
only work only once, and the migration can not work. It seems that a 
restored vm lost those pages ballooned down. For migration, destination 
does not have those pages which ballooned down on source.

But if i balloon down those pages every time(not driver first load),  i 
tested save/restore/migration for several times, and all work fine.  But 
the domu will waste lots of memory in this situation.

I will do more test about this and update here.

Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Aug-20 12:28 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 20/08/2009 12:55, "ANNIE LI" <annie.li@oracle.com> wrote:
>> Just finished test,  live migration works well on winpv domu after
>> ballooning down those pages when pv driver first load.
> After more test, i find there are some problem with this method:
> ballooning down those pages when pv driver first load.
> 
> If i balloon down those pages when driver first load, save/restore can
> only work only once, and the migration can not work.
Oh dear.
> It seems that a 
> restored vm lost those pages ballooned down. For migration, destination
> does not have those pages which ballooned down on source.
Right, that''s the correct behaviour isn''t it? Pages freed on
source VM do
not magically reappear on destination VM?
> But if i balloon down those pages every time(not driver first load),  i
> tested save/restore/migration for several times, and all work fine.  But
> the domu will waste lots of memory in this situation.
Yes, that''s weird. Do you know what condition causes guest memory
allocation
failure on xc_domain_restore? Is it due to hitting the guest maxmem limit in
Xen? If so, is maxmem the same value across multiple iterations of
save/restore or migration?
> I will do more test about this and update here.
Thanks.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-21 04:11 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi
>> It seems that a 
>> restored vm lost those pages ballooned down. For migration, destination
>> does not have those pages which ballooned down on source.
>>     
>
> Right, that''s the correct behaviour isn''t it? Pages freed
on source VM do
> not magically reappear on destination VM?
>
>   
Yes,  so this method can not fix this problem.>> But if i balloon down those pages every time(not driver first load),  i
>> tested save/restore/migration for several times, and all work fine. 
But
>> the domu will waste lots of memory in this situation.
>>     
>
> Yes, that''s weird. Do you know what condition causes guest memory
allocation
> failure on xc_domain_restore? Is it due to hitting the guest maxmem limit
in
> Xen? If so, is maxmem the same value across multiple iterations of
> save/restore or migration?
>   Sorry, i have no idea about it. Maybe I need to print more log in 
for(;;) in xc_domain_restore to see what is the difference between 
without and with balooning down pages.

Thanks
Annie.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wayne Gong

2009-Aug-25 10:02 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

James Harper wrote:>
> I added some more debugging...
>
> batch 1024 [1]
> Allocating 1024 mfns [2]
> 197600 allocated [3]
> batch 1024
> Allocating 33 mfns
> Failed allocation for dom 24: 33 extents of order 0 (err = 32) [4]
>
> [1] is just after ''j'' is read in xc_domain_restore
> [2] is just before the call to populate_physmap
> [3] is just after the call to populate_physmap
> [4] is the error message in the memory_op function in libxc modified to
> give the value of err
>
> According to the a total of 197632 are being allocated and the last page
> cannot be (could be more pages required the next time around the loop
> too...)
I''ve also added some debugging in xc_domain_save.c and 
xc_domain_restore.c. Attach has those two files which I am using and 
xend.log. Those two files are modified based on Xen 3.4.0-xx.

What I''ve done is:
1, Create a Windows Server 2008 32bit with PV drivers which can support 
migration on Xen 3.1.x.
2, Save windows DomU then restore it. Error log is the same as the first 
mail of this thread.
3, Create a fresh install Windows 2008 32bit VM with almost the same 
vm.cfg file.
4, Save and restore windows DomU.

Here is some xend logs. I am not very sure if it''s fine as I
didn''t know
the whole process of save/restore in Xen. If you need any more 
information, please let me know.

    Line 100: [2009-08-26 01:34:24 2883] INFO (XendCheckpoint:418) 
shared_info_frame: 0x41ec
    Line 167: [2009-08-26 01:34:39 2883] INFO (XendCheckpoint:418) 
shared_info_frame: 0xffffffff
    Line 581: [2009-08-26 02:10:47 2883] INFO (XendCheckpoint:418) 
shared_info_frame: 0xfffff
    Line 649: [2009-08-26 02:11:01 2883] INFO (XendCheckpoint:418) 
shared_info_frame: 0xffffffff    <-- is this OK for a PVHVM when restoring.

   Line 169: [2009-08-26 01:34:39 2883] INFO (XendCheckpoint:418) 
Reloading memory pages:   0%
   Line 170: [2009-08-26 01:34:39 2883] INFO (XendCheckpoint:418) batch 1024
   Line 171: [2009-08-26 01:34:39 2883] INFO (XendCheckpoint:418) 
nr_mfns: 991      <----

   Line 651: [2009-08-26 02:11:01 2883] INFO (XendCheckpoint:418) 
Reloading memory pages:   0%
   Line 652: [2009-08-26 02:11:01 2883] INFO (XendCheckpoint:418) batch 1024
   Line 653: [2009-08-26 02:11:01 2883] INFO (XendCheckpoint:418) 
nr_mfns: 992      <---- nf_mfns of HVM is one more lager then
PVHVM''s

   Line 472: [2009-08-26 01:34:46 2883] INFO (XendCheckpoint:418) batch 1024
   Line 473: [2009-08-26 01:34:46 2883] INFO (XendCheckpoint:418) 
nr_mfns: 31   <---Restore a PVHVM will allocate 31 more nf_mfns then HVM.

thanks
wayne




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-26 11:04 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi,
>> Yes, that''s weird. Do you know what condition causes guest
memory allocation
>> failure on xc_domain_restore? Is it due to hitting the guest maxmem
limit in
>> Xen? If so, is maxmem the same value across multiple iterations of
>> save/restore or migration?
>>   
> Sorry, i have no idea about it. Maybe I need to print more log in 
> for(;;) in xc_domain_restore to see what is the difference between 
> without and with balooning down pages.I did some migration test on linux/windows PVHVM on Xen3.4.

*  I printed the value of  "pfn      = region_pfn_type[i] & 
~XEN_DOMCTL_PFINFO_LTAB_MASK;"  in xc_domain_restore.c. When restoring 
fails with error "Failed allocation for dom 2: 33 extents of order 0",
the value of pfn is less than that of restoring successfully. So i think 
it should not due to hitting the guest maxmem limit in Xen. Is it correct?

* After comparing difference between with and without ballooning down 
(gnttab+shinfo) pages, i find that:

If the windows pv driver balloon down those pages, there will be more 
pages with XEN_DOMCTL_PFINFO_XTAB type in saving process. Furthermore, 
more bogus/unmapped page are skipped in restoring process.
If the winpv driver do not balloon down those pages,  there are only a 
little such pages with XEN_DOMCTL_PFINFO_XTAB type to be processed 
during save/restore process.

* Another result about winpv driver with ballooning down those pages
 When doing save/restore for the second time, i find p2msize in 
restoring process become 0xfefff which is less than the normal size 
0x100000.

Any suggestion about those test result? Or any idea to resolve this 
problem in winpv or xen?

Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-27 09:28 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi,
> I did some migration test on linux/windows PVHVM on Xen3.4.
>
> *  I printed the value of  "pfn      = region_pfn_type[i] & 
> ~XEN_DOMCTL_PFINFO_LTAB_MASK;"  in xc_domain_restore.c. When restoring
> fails with error "Failed allocation for dom 2: 33 extents of order
0",
> the value of pfn is less than that of restoring successfully. So i 
> think it should not due to hitting the guest maxmem limit in Xen. Is 
> it correct?
>
> * After comparing difference between with and without ballooning down 
> (gnttab+shinfo) pages, i find that:
>
> If the windows pv driver balloon down those pages, there will be more 
> pages with XEN_DOMCTL_PFINFO_XTAB type in saving process. Furthermore, 
> more bogus/unmapped page are skipped in restoring process.
> If the winpv driver do not balloon down those pages,  there are only a 
> little such pages with XEN_DOMCTL_PFINFO_XTAB type to be processed 
> during save/restore process.
>
> * Another result about winpv driver with ballooning down those pages
> When doing save/restore for the second time, i find p2msize in 
> restoring process become 0xfefff which is less than the normal size 
> 0x100000.
>
> Any suggestion about those test result? Or any idea to resolve this 
> problem in winpv or xen?I did more save/restore test, and compare the logs between linux and 
windows PVHVM. Those two vms have same memory size.
It seems that most log of them are identical, but the only difference 
between them is also connected with XEN_DOMCTL_PFINFO_XTAB type pages. 
 From the comments in the code, XEN_DOMCTL_PFINFO_XTAB type means 
invalid page.

For linux PVHVM, it have more 31 invalid pages than windows PVHVM during 
saving process. 
In for ( j = 0; j < batch; j++ ) of xc_domain_save.c, linux PVHVM will 
take those pages with pfn value between f2003 and f2021 as invalid 
pages. But windows PVHVM took those pages as normal pages.

Then in restoring process,  more memory are allocated for windows PVHVM 
than linux PVHVM. For example:
When windows PVHVM hit the issue: "Failed allocation for dom 2: 33 
extents of order 0",  and the log shows that nr_mfns before 
"xc_domain_memory_populate_physmap" is 33. However it is only 14 at
the
same process of restoring linux PVHVM.

It seems there should be more invalid pages in the saving process of 
windows PVHVM. But i failed to get the root cause of it. Any suggestions?

Thanks
Annie.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Aug-28 03:10 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi

As we discussed in this thread before, all windows PVHVM should fail to 
migration on Xen3.4.
Can anyone tell me whether Citrix windows pv driver 
save/restore/migration work properly or not on Xen3.4?

Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Sep-02 04:05 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi

It seems this problem is connected with gnttab, not shareinfo.
I changed some code about grant table in winpv driver (not using balloon 
down shinfo+gnttab method), save/restore/migration can work properly on 
Xen3.4 now.

What i changed is winpv driver use hypercall XENMEM_add_to_physmap to 
map corresponding grant tables which devices require, instead of mapping 
all 32 pages grant table during initialization.  It seems those extra 
grant table mapping cause this problem.

Thanks
Annie.

 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Sep-02 04:27 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

> It seems this problem is connected with gnttab, not shareinfo.
> I changed some code about grant table in winpv driver (not using 
> balloon down shinfo+gnttab method), save/restore/migration can work 
> properly on Xen3.4 now.
>
> What i changed is winpv driver use hypercall XENMEM_add_to_physmap to 
> map corresponding grant tables which devices require, instead of 
> mapping all 32 pages grant table during initialization.  It seems 
> those extra grant table mapping cause this problem. 
Wondering whether those extra grant table mapping is the root cause of 
the migration problem? or by luck as linux PVHVM too?

Thanks
Annie.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Sep-04 21:28 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

I think I''ve tracked down the cause of this problem
in the hypervisor, but am unsure how to best fix it.

In tools/libxc/xc_domain_save.c, the static variable p2m_size
is said to be "number of pfns this guest has (i.e. number of
entries in the P2M)".  But apparently p2m_size is getting
set to a very large number (0x100000) regardless of the
maximum psuedophysical memory for the hvm guest.  As a result,
some "magic" pages in the 0xf0000-0xfefff range are getting
placed in the save file.  But since they are not "real"
pages, the restore process runs beyond the maximum number
of physical pages allowed for the domain and fails.
(The gpfn of the last 24 pages saved are f2020, fc000-fc012,
feffb, feffc, feffd, feffe.)

p2m_size is set in "save" with a call to a memory_op hypercall
(XENMEM_maximum_gpfn) which for an hvm domain returns
d->arch.p2m->max_mapped_pfn.  I suspect that the meaning
of max_mapped_pfn changed at some point to more match
its name, but this changed the semantics of the hypercall
as used by xc_domain_restore, resulting in this curious
problem.

Any thoughts on how to fix this?
> -----Original Message-----
> From: Annie Li 
> Sent: Tuesday, September 01, 2009 10:27 PM
> To: Keir Fraser
> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
> 
> 
> 
> > It seems this problem is connected with gnttab, not shareinfo.
> > I changed some code about grant table in winpv driver (not using 
> > balloon down shinfo+gnttab method), save/restore/migration can work 
> > properly on Xen3.4 now.
> >
> > What i changed is winpv driver use hypercall 
> XENMEM_add_to_physmap to 
> > map corresponding grant tables which devices require, instead of 
> > mapping all 32 pages grant table during initialization.  It seems 
> > those extra grant table mapping cause this problem. 
> 
> Wondering whether those extra grant table mapping is the root 
> cause of 
> the migration problem? or by luck as linux PVHVM too?
> 
> Thanks
> Annie.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Sep-04 23:02 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

On further debugging, it appears that the
p2m_size may be OK, but there''s something about
those 24 "magic" gpfns that isn''t quite right.
> -----Original Message-----
> From: Dan Magenheimer 
> Sent: Friday, September 04, 2009 3:29 PM
> To: Wayne Gong; Annie Li; Keir Fraser
> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] Error restoring DomU when using GPLPV
> 
> 
> I think I''ve tracked down the cause of this problem
> in the hypervisor, but am unsure how to best fix it.
> 
> In tools/libxc/xc_domain_save.c, the static variable p2m_size
> is said to be "number of pfns this guest has (i.e. number of
> entries in the P2M)".  But apparently p2m_size is getting
> set to a very large number (0x100000) regardless of the
> maximum psuedophysical memory for the hvm guest.  As a result,
> some "magic" pages in the 0xf0000-0xfefff range are getting
> placed in the save file.  But since they are not "real"
> pages, the restore process runs beyond the maximum number
> of physical pages allowed for the domain and fails.
> (The gpfn of the last 24 pages saved are f2020, fc000-fc012,
> feffb, feffc, feffd, feffe.)
> 
> p2m_size is set in "save" with a call to a memory_op hypercall
> (XENMEM_maximum_gpfn) which for an hvm domain returns
> d->arch.p2m->max_mapped_pfn.  I suspect that the meaning
> of max_mapped_pfn changed at some point to more match
> its name, but this changed the semantics of the hypercall
> as used by xc_domain_restore, resulting in this curious
> problem.
> 
> Any thoughts on how to fix this?
> 
> > -----Original Message-----
> > From: Annie Li 
> > Sent: Tuesday, September 01, 2009 10:27 PM
> > To: Keir Fraser
> > Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
> > 
> > 
> > 
> > > It seems this problem is connected with gnttab, not shareinfo.
> > > I changed some code about grant table in winpv driver (not using 
> > > balloon down shinfo+gnttab method), 
> save/restore/migration can work 
> > > properly on Xen3.4 now.
> > >
> > > What i changed is winpv driver use hypercall 
> > XENMEM_add_to_physmap to 
> > > map corresponding grant tables which devices require, instead of 
> > > mapping all 32 pages grant table during initialization.  It seems
> > > those extra grant table mapping cause this problem. 
> > 
> > Wondering whether those extra grant table mapping is the root 
> > cause of 
> > the migration problem? or by luck as linux PVHVM too?
> > 
> > Thanks
> > Annie.
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Sep-05 04:02 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Keir Fraser wrote:> On 04/08/2009 12:34, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> 
>>> Like I said before -- unmapping the gnttab pages I think will not
help
>> you
>>> for live migration, but I suppose it is a reasonable thing to do
>> anyway. For
>>> live migration I think xc_domain_save needs t get a bit smarter
about
>>> Xenheap pages in HVM guests.
>> Understood. Do you have any idea about why it worked fine under 3.3.x
>> but not 3.4.x?
> 
> The bit of code in 3.3''s xc_domain_save.c that is commented
"Skip PFNs that
> aren''t really there" is removed in 3.4. That will be the
reason.
> 
>  -- Keir
Hi,

I started looking at this couple days ago, and finally understand
what''s going on. In our case, win migration/save-restore just fails, as
Annie/Wayne had posted.

In the short run, since frames for vga etc are skipped anyways, can we
just put the above change back in libxc (xen 3.4) and be ok?

thanks,
Mukesh


changeset:   18383:dade7f0bdc8d
user:        Keir Fraser <keir.fraser@citrix.com>
date:        Wed Aug 27 14:53:39 2008 +0100
summary:     hvm: Use main memory for video memory.

diff -r 2397555ebcc2 -r dade7f0bdc8d tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c      Wed Aug 27 13:31:01 2008 +0100
+++ b/tools/libxc/xc_domain_save.c      Wed Aug 27 14:53:39 2008 +0100
@@ -1111,12 +1111,6 @@
                         (test_bit(n, to_fix)  && last_iter)) )
                      continue;

-                /* Skip PFNs that aren''t really there */
-                if ( hvm && ((n >= 0xa0 && n < 0xc0) /*
VGA hole */
-                             || (n >= (HVM_BELOW_4G_MMIO_START >>
PAGE_SHIFT)
-                                 && n < (1ULL<<32) >>
PAGE_SHIFT)) /* MMIO */ )
-                    continue;
-

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-05 06:49 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 05/09/2009 05:02, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
>> The bit of code in 3.3''s xc_domain_save.c that is commented
"Skip PFNs that
>> aren''t really there" is removed in 3.4. That will be the
reason.
> 
> I started looking at this couple days ago, and finally understand
> what''s going on. In our case, win migration/save-restore just
fails, as
> Annie/Wayne had posted.
> 
> In the short run, since frames for vga etc are skipped anyways, can we
> just put the above change back in libxc (xen 3.4) and be ok?
I don''t think vga frames are skipped. I think vga now gets saved by
xc_domain_save.

Also some real RAM is up in that area these days. Like ACPI tables for
example.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-05 06:52 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Not all those pages are special. Frames fc0xx will be ACPI tables, resident
in ordinary guest memory pages, for example. Only the Xen-heap pages are
special and need to be (1) skipped; or (2) unmapped by the HVMPV drivers on
suspend; or (3) accounted for by HVMPV drivers by unmapping and freeing an
equal number of domain-heap pages. (1) is ''nicest'' but
actually a bit of a
pain to implement; (2) won''t work well for live migration, where the
pages
wouldn''t get unmapped by the drivers until the last round of page
copying;
and (3) was apparently tried by Annie but didn''t work? I''m
curious why (3)
didn''t work - I can''t explain that.

 -- Keir

On 05/09/2009 00:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> On further debugging, it appears that the
> p2m_size may be OK, but there''s something about
> those 24 "magic" gpfns that isn''t quite right.
> 
>> -----Original Message-----
>> From: Dan Magenheimer
>> Sent: Friday, September 04, 2009 3:29 PM
>> To: Wayne Gong; Annie Li; Keir Fraser
>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] Error restoring DomU when using GPLPV
>> 
>> 
>> I think I''ve tracked down the cause of this problem
>> in the hypervisor, but am unsure how to best fix it.
>> 
>> In tools/libxc/xc_domain_save.c, the static variable p2m_size
>> is said to be "number of pfns this guest has (i.e. number of
>> entries in the P2M)".  But apparently p2m_size is getting
>> set to a very large number (0x100000) regardless of the
>> maximum psuedophysical memory for the hvm guest.  As a result,
>> some "magic" pages in the 0xf0000-0xfefff range are getting
>> placed in the save file.  But since they are not "real"
>> pages, the restore process runs beyond the maximum number
>> of physical pages allowed for the domain and fails.
>> (The gpfn of the last 24 pages saved are f2020, fc000-fc012,
>> feffb, feffc, feffd, feffe.)
>> 
>> p2m_size is set in "save" with a call to a memory_op
hypercall
>> (XENMEM_maximum_gpfn) which for an hvm domain returns
>> d->arch.p2m->max_mapped_pfn.  I suspect that the meaning
>> of max_mapped_pfn changed at some point to more match
>> its name, but this changed the semantics of the hypercall
>> as used by xc_domain_restore, resulting in this curious
>> problem.
>> 
>> Any thoughts on how to fix this?
>> 
>>> -----Original Message-----
>>> From: Annie Li 
>>> Sent: Tuesday, September 01, 2009 10:27 PM
>>> To: Keir Fraser
>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>> 
>>> 
>>> 
>>>> It seems this problem is connected with gnttab, not shareinfo.
>>>> I changed some code about grant table in winpv driver (not
using
>>>> balloon down shinfo+gnttab method),
>> save/restore/migration can work
>>>> properly on Xen3.4 now.
>>>> 
>>>> What i changed is winpv driver use hypercall
>>> XENMEM_add_to_physmap to
>>>> map corresponding grant tables which devices require, instead
of
>>>> mapping all 32 pages grant table during initialization.  It
seems
>>>> those extra grant table mapping cause this problem.
>>> 
>>> Wondering whether those extra grant table mapping is the root
>>> cause of 
>>> the migration problem? or by luck as linux PVHVM too?
>>> 
>>> Thanks
>>> Annie.
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Sep-05 07:33 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Yes. About (3), my test result is save/restore can work only once if 
ballooning down pages when driver
 first load. But it works if ballooning down when driver load every times.

Thanks
Annie.

Keir Fraser wrote:> Not all those pages are special. Frames fc0xx will be ACPI tables, resident
> in ordinary guest memory pages, for example. Only the Xen-heap pages are
> special and need to be (1) skipped; or (2) unmapped by the HVMPV drivers on
> suspend; or (3) accounted for by HVMPV drivers by unmapping and freeing an
> equal number of domain-heap pages. (1) is ''nicest'' but
actually a bit of a
> pain to implement; (2) won''t work well for live migration, where
the pages
> wouldn''t get unmapped by the drivers until the last round of page
copying;
> and (3) was apparently tried by Annie but didn''t work?
I''m curious why (3)
> didn''t work - I can''t explain that.
>
>  -- Keir
>
> On 05/09/2009 00:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>
>   
>> On further debugging, it appears that the
>> p2m_size may be OK, but there''s something about
>> those 24 "magic" gpfns that isn''t quite right.
>>
>>     
>>> -----Original Message-----
>>> From: Dan Magenheimer
>>> Sent: Friday, September 04, 2009 3:29 PM
>>> To: Wayne Gong; Annie Li; Keir Fraser
>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>> Subject: RE: [Xen-devel] Error restoring DomU when using GPLPV
>>>
>>>
>>> I think I''ve tracked down the cause of this problem
>>> in the hypervisor, but am unsure how to best fix it.
>>>
>>> In tools/libxc/xc_domain_save.c, the static variable p2m_size
>>> is said to be "number of pfns this guest has (i.e. number of
>>> entries in the P2M)".  But apparently p2m_size is getting
>>> set to a very large number (0x100000) regardless of the
>>> maximum psuedophysical memory for the hvm guest.  As a result,
>>> some "magic" pages in the 0xf0000-0xfefff range are
getting
>>> placed in the save file.  But since they are not "real"
>>> pages, the restore process runs beyond the maximum number
>>> of physical pages allowed for the domain and fails.
>>> (The gpfn of the last 24 pages saved are f2020, fc000-fc012,
>>> feffb, feffc, feffd, feffe.)
>>>
>>> p2m_size is set in "save" with a call to a memory_op
hypercall
>>> (XENMEM_maximum_gpfn) which for an hvm domain returns
>>> d->arch.p2m->max_mapped_pfn.  I suspect that the meaning
>>> of max_mapped_pfn changed at some point to more match
>>> its name, but this changed the semantics of the hypercall
>>> as used by xc_domain_restore, resulting in this curious
>>> problem.
>>>
>>> Any thoughts on how to fix this?
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Annie Li 
>>>> Sent: Tuesday, September 01, 2009 10:27 PM
>>>> To: Keir Fraser
>>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>>>
>>>>
>>>>
>>>>         
>>>>> It seems this problem is connected with gnttab, not
shareinfo.
>>>>> I changed some code about grant table in winpv driver (not
using
>>>>> balloon down shinfo+gnttab method),
>>>>>           
>>> save/restore/migration can work
>>>       
>>>>> properly on Xen3.4 now.
>>>>>
>>>>> What i changed is winpv driver use hypercall
>>>>>           
>>>> XENMEM_add_to_physmap to
>>>>         
>>>>> map corresponding grant tables which devices require,
instead of
>>>>> mapping all 32 pages grant table during initialization.  It
seems
>>>>> those extra grant table mapping cause this problem.
>>>>>           
>>>> Wondering whether those extra grant table mapping is the root
>>>> cause of 
>>>> the migration problem? or by luck as linux PVHVM too?
>>>>
>>>> Thanks
>>>> Annie.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>         
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>       
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Sep-15 02:25 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Ok, I''ve been looking at this and figured what''s going on.
Annie''s problem
lies in not remapping the grant frames post migration. Hence the leak,
tot_pages goes up every time until migration fails. On linux, remapping
is where the frames created by restore (for heap pfn''s), get freed back
to
the dom heap, is what I found.  So that''s a fix to be made on win
pv driver side.

Now back to orig problem. As you already know, because libxc is not
skipping heap pages, tot_pages in struct domain{} temporarily goes up
by (shared-info-frame + gnt-frames) until guest remaps these pages.
Hence, migration fails if
       (max_pages - tot_pages) < (shared-info-frame + gnt-frames).

Occassionally, I see tot_pages nearly same as max_pages, and I don''t
know of all ways that may happen or what causes that to happen
(by default, i see tot_pages short by 21).

Anyways, of two solutions:

1. Always balloon down, shinfo+gnttab frames: This needs to be done just
    once during load, right? I''m not sure how it would work tho if mem
gets
    ballooned up subsequently. I suppose the driver will have to intercept
    every increase in reservation and balloon down everytime?

    Also, balloon down during suspend call would prob be too late, right?

2. libxc fix: I wonder how much work this will be. Good thing here is,
    it''ll take care of both linux and PV HVM guests avoiding driver
    updates in many versions, and hence appealing to us. Can we somehow
    mark the frames special to be skipped? Looking at biiig xc_domain_save
    function, not sure in case of HVM, how pfn_type gets set. May be before the
    outer loop, it could ask hyp for all xen heap page list, but then what if a
    new page gets added to the list in between.....


Also, unfortunately, the failure case is not handled properly sometimes.
If migration fails after suspend, then no way to get the guest
back. I even noticed, the guest disappeared totally from both source and
target when failed, couple times of several dozen migrations I did.


thanks,
Mukesh



Keir Fraser wrote:> Not all those pages are special. Frames fc0xx will be ACPI tables, resident
> in ordinary guest memory pages, for example. Only the Xen-heap pages are
> special and need to be (1) skipped; or (2) unmapped by the HVMPV drivers on
> suspend; or (3) accounted for by HVMPV drivers by unmapping and freeing an
> equal number of domain-heap pages. (1) is ''nicest'' but
actually a bit of a
> pain to implement; (2) won''t work well for live migration, where
the pages
> wouldn''t get unmapped by the drivers until the last round of page
copying;
> and (3) was apparently tried by Annie but didn''t work?
I''m curious why (3)
> didn''t work - I can''t explain that.
> 
>  -- Keir
> 
> On 05/09/2009 00:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> 
>> On further debugging, it appears that the
>> p2m_size may be OK, but there''s something about
>> those 24 "magic" gpfns that isn''t quite right.
>>
>>> -----Original Message-----
>>> From: Dan Magenheimer
>>> Sent: Friday, September 04, 2009 3:29 PM
>>> To: Wayne Gong; Annie Li; Keir Fraser
>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>> Subject: RE: [Xen-devel] Error restoring DomU when using GPLPV
>>>
>>>
>>> I think I''ve tracked down the cause of this problem
>>> in the hypervisor, but am unsure how to best fix it.
>>>
>>> In tools/libxc/xc_domain_save.c, the static variable p2m_size
>>> is said to be "number of pfns this guest has (i.e. number of
>>> entries in the P2M)".  But apparently p2m_size is getting
>>> set to a very large number (0x100000) regardless of the
>>> maximum psuedophysical memory for the hvm guest.  As a result,
>>> some "magic" pages in the 0xf0000-0xfefff range are
getting
>>> placed in the save file.  But since they are not "real"
>>> pages, the restore process runs beyond the maximum number
>>> of physical pages allowed for the domain and fails.
>>> (The gpfn of the last 24 pages saved are f2020, fc000-fc012,
>>> feffb, feffc, feffd, feffe.)
>>>
>>> p2m_size is set in "save" with a call to a memory_op
hypercall
>>> (XENMEM_maximum_gpfn) which for an hvm domain returns
>>> d->arch.p2m->max_mapped_pfn.  I suspect that the meaning
>>> of max_mapped_pfn changed at some point to more match
>>> its name, but this changed the semantics of the hypercall
>>> as used by xc_domain_restore, resulting in this curious
>>> problem.
>>>
>>> Any thoughts on how to fix this?
>>>
>>>> -----Original Message-----
>>>> From: Annie Li 
>>>> Sent: Tuesday, September 01, 2009 10:27 PM
>>>> To: Keir Fraser
>>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>>>
>>>>
>>>>
>>>>> It seems this problem is connected with gnttab, not
shareinfo.
>>>>> I changed some code about grant table in winpv driver (not
using
>>>>> balloon down shinfo+gnttab method),
>>> save/restore/migration can work
>>>>> properly on Xen3.4 now.
>>>>>
>>>>> What i changed is winpv driver use hypercall
>>>> XENMEM_add_to_physmap to
>>>>> map corresponding grant tables which devices require,
instead of
>>>>> mapping all 32 pages grant table during initialization.  It
seems
>>>>> those extra grant table mapping cause this problem.
>>>> Wondering whether those extra grant table mapping is the root
>>>> cause of 
>>>> the migration problem? or by luck as linux PVHVM too?
>>>>
>>>> Thanks
>>>> Annie.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-15 07:39 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 15/09/2009 03:25, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
> Ok, I''ve been looking at this and figured what''s going
on. Annie''s problem
> lies in not remapping the grant frames post migration. Hence the leak,
> tot_pages goes up every time until migration fails. On linux, remapping
> is where the frames created by restore (for heap pfn''s), get freed
back to
> the dom heap, is what I found.  So that''s a fix to be made on win
> pv driver side.
Although obviosuly that is a bug, I''m not sure why it would cause this
particular issue? The domheap pages do not get freed and replaced with
xenheap pages, but why does that affect the next save/restore cycle? After
all, xc_domain_save does not distinguish between Xenheap and domheap pages?
> 1. Always balloon down, shinfo+gnttab frames: This needs to be done just
>     once during load, right? I''m not sure how it would work tho if
mem gets
>     ballooned up subsequently. I suppose the driver will have to intercept
>     every increase in reservation and balloon down everytime?
Well, it is the same driver that is doing the ballooning, so it''s kind
of
easy to intercept, right? Just need to track how many Xenheap pages are
mapped and maintain that amount of ''balloon down''.
>     Also, balloon down during suspend call would prob be too late, right?
Indeed it would. Need to do it during boot. It''s only a few pages
though, so
noone will miss them.
> 2. libxc fix: I wonder how much work this will be. Good thing here is,
>     it''ll take care of both linux and PV HVM guests avoiding
driver
>     updates in many versions, and hence appealing to us. Can we somehow
>     mark the frames special to be skipped? Looking at biiig xc_domain_save
>     function, not sure in case of HVM, how pfn_type gets set. May be before
> the
>     outer loop, it could ask hyp for all xen heap page list, but then what
if
> a
>     new page gets added to the list in between.....
It''s a pain. Pfn_type[] I think doesn''t really get used.
Xc_domain_save()
just tries to map PFNs and saves all the ones it successfully maps. So the
problem is it is allowed to map Xenheap pages. But we can''t always
disallow
that because sometimes the tools have good reason to map Xenheap pages. So
we''d need a new hypercall, or a flag, or something, and that would need
dom0
kernel changes as well as Xen and toolstack changes. So it''s rather a
pain.
> Also, unfortunately, the failure case is not handled properly sometimes.
> If migration fails after suspend, then no way to get the guest
> back. I even noticed, the guest disappeared totally from both source and
> target when failed, couple times of several dozen migrations I did.
That shouldn''t happen since there is a mechanism to cancel the
suspension of
a suspended guest. Possibly xend doesn''t get it right every time, as
it''s
error handling is pretty poor in general. I trust the underlying mechanisms
below xend pretty well however.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Sep-15 19:14 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Keir Fraser wrote:> On 15/09/2009 03:25, "Mukesh Rathor"
<mukesh.rathor@oracle.com> wrote:
> 
>> Ok, I''ve been looking at this and figured what''s
going on. Annie''s problem
>> lies in not remapping the grant frames post migration. Hence the leak,
>> tot_pages goes up every time until migration fails. On linux, remapping
>> is where the frames created by restore (for heap pfn''s), get
freed back to
>> the dom heap, is what I found.  So that''s a fix to be made on
win
>> pv driver side.
> 
> Although obviosuly that is a bug, I''m not sure why it would cause
this
> particular issue? The domheap pages do not get freed and replaced with
> xenheap pages, but why does that affect the next save/restore cycle? After
> all, xc_domain_save does not distinguish between Xenheap and domheap pages?
xc_domain_save doesn''t distinguish is actually the problem, as
xc_domain_restore then backs xenheap pfn''s for shinfo/gnt frames with
dom
heap pages. These dom heap pages do get freed and replaced by xenheap pages
on target host (upon guest remap in gnttab_map()) in following code:

arch_memory_op():
         /* Remove previously mapped page if it was present. */
         prev_mfn = gmfn_to_mfn(d, xatp.gpfn);
         if ( mfn_valid(prev_mfn) )
         {
                 .....
                 guest_remove_page(d, xatp.gpfn);  <======         }

Eg. my guest with 128M gets created with tot_pages=0x83eb
max_pages:0x8400. Now xc_domain_save saves all, 0x83eb+shinfo+gnt
frames(2), so I see tot_pages on target go upto 0x83ee. Now, guest
remaps() shinfo and gnt frames. The dom heap pages are returned in
guest_remove_page(), tot_pages goes back to 0x83eb. In Annie''s case,
driver forgets to remap the 2 gnt frames, so dom heap pages are wrongly
mapped and tot_pages remains at 0x83ed, and after few more when it reaches
0x83ff, migration fails as save is not be able to create
0x83ff+shinfo+gntframes temporarily, max_page being 0x8400.

Hope that makes sense.

>> 1. Always balloon down, shinfo+gnttab frames: This needs to be done
just
>>     once during load, right? I''m not sure how it would work
tho if mem gets
>>     ballooned up subsequently. I suppose the driver will have to
intercept
>>     every increase in reservation and balloon down everytime?
> 
> Well, it is the same driver that is doing the ballooning, so it''s
kind of
> easy to intercept, right? Just need to track how many Xenheap pages are
> mapped and maintain that amount of ''balloon down''.
Yup, that''s what I thought, but just wanted to make sure.
>>     Also, balloon down during suspend call would prob be too late,
right?
> 
> Indeed it would. Need to do it during boot. It''s only a few pages
though, so
> noone will miss them.
> 
>> 2. libxc fix: I wonder how much work this will be. Good thing here is,
>>     it''ll take care of both linux and PV HVM guests avoiding
driver
>>     updates in many versions, and hence appealing to us. Can we somehow
>>     mark the frames special to be skipped? Looking at biiig
xc_domain_save
>>     function, not sure in case of HVM, how pfn_type gets set. May be
before
>> the
>>     outer loop, it could ask hyp for all xen heap page list, but then
what if
>> a
>>     new page gets added to the list in between.....
> 
> It''s a pain. Pfn_type[] I think doesn''t really get used.
Xc_domain_save()
> just tries to map PFNs and saves all the ones it successfully maps. So the
> problem is it is allowed to map Xenheap pages. But we can''t always
disallow
> that because sometimes the tools have good reason to map Xenheap pages. So
> we''d need a new hypercall, or a flag, or something, and that would
need dom0
> kernel changes as well as Xen and toolstack changes. So it''s
rather a pain.
Ok got it, I think driver change is the way to go.
>> Also, unfortunately, the failure case is not handled properly
sometimes.
>> If migration fails after suspend, then no way to get the guest
>> back. I even noticed, the guest disappeared totally from both source
and
>> target when failed, couple times of several dozen migrations I did.
> 
> That shouldn''t happen since there is a mechanism to cancel the
suspension of
> a suspended guest. Possibly xend doesn''t get it right every time,
as it''s
> error handling is pretty poor in general. I trust the underlying mechanisms
> below xend pretty well however.
>  -- Keir

thanks a lot,
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-15 21:25 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 15/09/2009 20:14, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
> Eg. my guest with 128M gets created with tot_pages=0x83eb
> max_pages:0x8400. Now xc_domain_save saves all, 0x83eb+shinfo+gnt
> frames(2), so I see tot_pages on target go upto 0x83ee. Now, guest
> remaps() shinfo and gnt frames. The dom heap pages are returned in
> guest_remove_page(), tot_pages goes back to 0x83eb. In Annie''s
case,
> driver forgets to remap the 2 gnt frames, so dom heap pages are wrongly
> mapped and tot_pages remains at 0x83ed, and after few more when it reaches
> 0x83ff, migration fails as save is not be able to create
> 0x83ff+shinfo+gntframes temporarily, max_page being 0x8400.
> 
> Hope that makes sense.
No, it doesn''t. I agree that after the first migration tot_pages will
have
increased to 0x83ed. But I do not agree that it will continue to increase by
three pages on each future migration. Look at it this way -- three GPFNs
(guest-physical pages) have changed from xenheap pages to domheap pages
across that first migration. On future migrations they will be migrated just
like any other ordinary domheap page, since that''s what they now are.
And
tot_pages will therefore not change. Right?

This is why I still cannot understand or explain Annie''s experimental
result.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-15 21:29 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 15/09/2009 22:25, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
> No, it doesn''t. I agree that after the first migration tot_pages
will have
> increased to 0x83ed. But I do not agree that it will continue to increase
by
> three pages on each future migration. Look at it this way -- three GPFNs
> (guest-physical pages) have changed from xenheap pages to domheap pages
> across that first migration. On future migrations they will be migrated
just
> like any other ordinary domheap page, since that''s what they now
are. And
> tot_pages will therefore not change. Right?
Actually of course you do the right thing with the shinfo page, so actually
one page per migration does get switched back to being a Xenheap page (the
shinfo page) and tot_pages actually increases by 3 on the first migration,
then decreases by 1 when shinfo gets remapped by the PV drivers. Then
increases by 1 on every future migration (which is the shinfo Xenheap page
getting changed into a domheap page), and then decreases by 1 when shinfo
gets remapped by the PV drivers.

But even setting things out exactly right as above, the end result is the
same: I *still* cannot explain Annie''s result.

 -- Keir
> This is why I still cannot understand or explain Annie''s
experimental
> result.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Sep-15 22:27 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Keir Fraser wrote:> On 15/09/2009 22:25, "Keir Fraser"
<keir.fraser@eu.citrix.com> wrote:
> 
>> No, it doesn''t. I agree that after the first migration
tot_pages will have
>> increased to 0x83ed. But I do not agree that it will continue to
increase by
>> three pages on each future migration. Look at it this way -- three
GPFNs
>> (guest-physical pages) have changed from xenheap pages to domheap pages
>> across that first migration. On future migrations they will be migrated
just
>> like any other ordinary domheap page, since that''s what they
now are. And
>> tot_pages will therefore not change. Right?
> 
> Actually of course you do the right thing with the shinfo page, so actually
> one page per migration does get switched back to being a Xenheap page (the
> shinfo page) and tot_pages actually increases by 3 on the first migration,
> then decreases by 1 when shinfo gets remapped by the PV drivers. Then
> increases by 1 on every future migration (which is the shinfo Xenheap page
> getting changed into a domheap page), and then decreases by 1 when shinfo
> gets remapped by the PV drivers.
> 
> But even setting things out exactly right as above, the end result is the
> same: I *still* cannot explain Annie''s result.
The bug in her driver is that its only remapping shinfo page, and NOT
the 2 shared grant frames. tot_pages hence increases by 2 every
migration. I can see all in kdb.  tot_pages goes up by 3, then down by 1
as shared info frame is remapped, and remains there. Next migration, it
goes up by 3, down by 1 again.  So each migration leaks 2 frames. The initial
difference is 21 frames between tot and max, hence after 10 migrations
it fails. (BTW, no max_mem specified in config file, I''m told it means
no POD).

On linux side, driver remaps shinfo page + both grant frames. So, it goes up
by 3 for a moment, then comes remap and down by 3, back to where it was. If
tot_pages == max_pages, then mig will fail. Which brings me to a question,
to test out balloon changes, what would be the best way to get tot_pages
equal to max_pages. xm mem-set doesn''t quite get me there.
Occassionally
I see the two same after starting guest, but I''ve not figured out what
causes that to happen.

thakns
Mukesh


>  -- Keir
> 
>> This is why I still cannot understand or explain Annie''s
experimental
>> result.
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Sep-16 04:37 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Hi,> Actually of course you do the right thing with the shinfo page, so actually
> one page per migration does get switched back to being a Xenheap page (the
> shinfo page) and tot_pages actually increases by 3 on the first migration,
> then decreases by 1 when shinfo gets remapped by the PV drivers. Then
> increases by 1 on every future migration (which is the shinfo Xenheap page
> getting changed into a domheap page), and then decreases by 1 when shinfo
> gets remapped by the PV drivers.
>
> But even setting things out exactly right as above, the end result is the
> same: I *still* cannot explain Annie''s result.The root cause is that winpv driver did not re-map gnttab frames during 
resuming.
Thanks Mukesh very much.

My initial implementation was to map all 32 grant table pages during 
initialization, and then balloon down
those pages during driver first load. However,  i leaked those 32 grant 
pages if i did not re-map those pages
during resuming. This is why Save/restore can work only once.

My second implementation is to map corresponding grant frames device 
needs instead of all 32 grant table.
But it will leak 2 frames every migration because of missing re-mapping 
grant tables.

Then i tried to re-map the grant table during resuming, and balloon down 
shinfo+gntab driver first load.
I did save/restore several times, did not hit any problem. Furthermore, 
i also tried to map 64 grant table pages
during initialization and ballooned down those pages, all work fine.

I will do more test to make sure it and update here.

Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

ANNIE LI

2009-Sep-16 11:10 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

> I will do more test to make sure it and update here.I tried to map 256 grant frames during initialization and balloon down 
256+1(shinfo+gnttab) pages driver first
load. Then i did save/restore for 50 times, and live migration for 10 
times. No error occurs.

Thanks
Annie.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-16 12:28 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

On 16/09/2009 12:10, "ANNIE LI" <annie.li@oracle.com> wrote:
>> I will do more test to make sure it and update here.
> I tried to map 256 grant frames during initialization and balloon down
> 256+1(shinfo+gnttab) pages driver first
> load. Then i did save/restore for 50 times, and live migration for 10
> times. No error occurs.
Okay, well I still can''t explain why that fixes it, but clearly it
does. So
that''s good. :-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Sep-16 18:09 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

Before we close down this thread, I have a concern:

According to Mukesh, the fix to this bug is dependent
on the pv drivers tracking tot_pages for a domain
and ballooning to ensure tot_pages+3 does not exceed
max_pages for the domain.

Well, tmem can affect tot_pages for a domain inside
the hypervisor without any notification to pv drivers
or the balloon driver.  And I''d imagine that PoD and
future memory optimization mechanisms such as
swapping and page-sharing may do the same.

So this solution seems very fragile.

Dan
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Wednesday, September 16, 2009 6:28 AM
> To: Annie Li
> Cc: Joshua West; Dan Magenheimer; xen-devel; Kurt Hackel; 
> James Harper;
> Wayne Gong
> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
> 
> 
> On 16/09/2009 12:10, "ANNIE LI" <annie.li@oracle.com>
wrote:
> 
> >> I will do more test to make sure it and update here.
> > I tried to map 256 grant frames during initialization and 
> balloon down
> > 256+1(shinfo+gnttab) pages driver first
> > load. Then i did save/restore for 50 times, and live 
> migration for 10
> > times. No error occurs.
> 
> Okay, well I still can''t explain why that fixes it, but 
> clearly it does. So
> that''s good. :-)
> 
>  -- Keir
> 
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Sep-16 20:50 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

just in case someone missed the thread earlier,

3 = 1 shinfo + 2 gnt frames default.

so, tot_pages + shinfo + num gnt frames.


Mukesh



Dan Magenheimer wrote:> Before we close down this thread, I have a concern:
> 
> According to Mukesh, the fix to this bug is dependent
> on the pv drivers tracking tot_pages for a domain
> and ballooning to ensure tot_pages+3 does not exceed
> max_pages for the domain.
> 
> Well, tmem can affect tot_pages for a domain inside
> the hypervisor without any notification to pv drivers
> or the balloon driver.  And I''d imagine that PoD and
> future memory optimization mechanisms such as
> swapping and page-sharing may do the same.
> 
> So this solution seems very fragile.
> 
> Dan
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Wednesday, September 16, 2009 6:28 AM
>> To: Annie Li
>> Cc: Joshua West; Dan Magenheimer; xen-devel; Kurt Hackel; 
>> James Harper;
>> Wayne Gong
>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>
>>
>> On 16/09/2009 12:10, "ANNIE LI" <annie.li@oracle.com>
wrote:
>>
>>>> I will do more test to make sure it and update here.
>>> I tried to map 256 grant frames during initialization and 
>> balloon down
>>> 256+1(shinfo+gnttab) pages driver first
>>> load. Then i did save/restore for 50 times, and live 
>> migration for 10
>>> times. No error occurs.
>> Okay, well I still can''t explain why that fixes it, but 
>> clearly it does. So
>> that''s good. :-)
>>
>>  -- Keir
>>
>>
>>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Sep-17 06:21 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV

Yeah, all the PV drivers are having to do is balloon down one page for every
Xenheap page they map. There''s no further complexity than that, so
let''s not
make a mountain out of a molehill. The approach as discussed and now
implemented should work fine with tmem I think.

 -- Keir

On 16/09/2009 21:50, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
> just in case someone missed the thread earlier,
> 
> 3 = 1 shinfo + 2 gnt frames default.
> 
> so, tot_pages + shinfo + num gnt frames.
> 
> 
> Mukesh
> 
> 
> 
> Dan Magenheimer wrote:
>> Before we close down this thread, I have a concern:
>> 
>> According to Mukesh, the fix to this bug is dependent
>> on the pv drivers tracking tot_pages for a domain
>> and ballooning to ensure tot_pages+3 does not exceed
>> max_pages for the domain.
>> 
>> Well, tmem can affect tot_pages for a domain inside
>> the hypervisor without any notification to pv drivers
>> or the balloon driver.  And I''d imagine that PoD and
>> future memory optimization mechanisms such as
>> swapping and page-sharing may do the same.
>> 
>> So this solution seems very fragile.
>> 
>> Dan
>> 
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>> Sent: Wednesday, September 16, 2009 6:28 AM
>>> To: Annie Li
>>> Cc: Joshua West; Dan Magenheimer; xen-devel; Kurt Hackel;
>>> James Harper;
>>> Wayne Gong
>>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>> 
>>> 
>>> On 16/09/2009 12:10, "ANNIE LI"
<annie.li@oracle.com> wrote:
>>> 
>>>>> I will do more test to make sure it and update here.
>>>> I tried to map 256 grant frames during initialization and
>>> balloon down
>>>> 256+1(shinfo+gnttab) pages driver first
>>>> load. Then i did save/restore for 50 times, and live
>>> migration for 10
>>>> times. No error occurs.
>>> Okay, well I still can''t explain why that fixes it, but
>>> clearly it does. So
>>> that''s good. :-)
>>> 
>>>  -- Keir
>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Sep-17 15:41 UTC

head link

RE: [Xen-devel] Error restoring DomU when using GPLPV

The problem is that every page that is ballooned down by
the balloon driver can be slurped up as a private-
persistent ("preswap") page by tmem.  Private-persistent
pages contain indirectly-accessible domain data, are counted
against the domain''s tot_pages, and are migrated along with
the domain-directly-accessible pages.

So any temporary mapping of xenheap pages into domheap,
such as occurs during restore/migration, can cause max_pages
to be exceeded.

This isn''t a problem today for tmem because tmem only runs
in PV domains today, but I suspect the fragileness of this
approach will come back and bite us.  It reminds me
of the classic "shell game".

Is there a per-domain counter of these special pages
somewhere?  If so, a MEMF flag could subtract this
from max_pages in the limit check in assign_pages(),
e.g.:

max = d->max_pages;
if ( memflags & MEMF_no_special )
    max -= d->special_pages;
<snip>
    if ( unlikely((d->tot_pages + ... > max )
        /* Over-allocation */

(Special_pages counts any xenheap pages
that contain domain-specific data that needs
to be retained across a migration.)

Dan
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Thursday, September 17, 2009 12:21 AM
> To: Mukesh Rathor; Dan Magenheimer
> Cc: Annie Li; Joshua West; James Harper; xen-devel; Wayne Gong; Kurt
> Hackel
> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
> 
> 
> Yeah, all the PV drivers are having to do is balloon down one 
> page for every
> Xenheap page they map. There''s no further complexity than 
> that, so let''s not
> make a mountain out of a molehill. The approach as discussed and now
> implemented should work fine with tmem I think.
> 
>  -- Keir
> 
> On 16/09/2009 21:50, "Mukesh Rathor"
<mukesh.rathor@oracle.com> wrote:
> 
> > just in case someone missed the thread earlier,
> > 
> > 3 = 1 shinfo + 2 gnt frames default.
> > 
> > so, tot_pages + shinfo + num gnt frames.
> > 
> > 
> > Mukesh
> > 
> > 
> > 
> > Dan Magenheimer wrote:
> >> Before we close down this thread, I have a concern:
> >> 
> >> According to Mukesh, the fix to this bug is dependent
> >> on the pv drivers tracking tot_pages for a domain
> >> and ballooning to ensure tot_pages+3 does not exceed
> >> max_pages for the domain.
> >> 
> >> Well, tmem can affect tot_pages for a domain inside
> >> the hypervisor without any notification to pv drivers
> >> or the balloon driver.  And I''d imagine that PoD and
> >> future memory optimization mechanisms such as
> >> swapping and page-sharing may do the same.
> >> 
> >> So this solution seems very fragile.
> >> 
> >> Dan
> >> 
> >>> -----Original Message-----
> >>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >>> Sent: Wednesday, September 16, 2009 6:28 AM
> >>> To: Annie Li
> >>> Cc: Joshua West; Dan Magenheimer; xen-devel; Kurt Hackel;
> >>> James Harper;
> >>> Wayne Gong
> >>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
> >>> 
> >>> 
> >>> On 16/09/2009 12:10, "ANNIE LI"
<annie.li@oracle.com> wrote:
> >>> 
> >>>>> I will do more test to make sure it and update here.
> >>>> I tried to map 256 grant frames during initialization and
> >>> balloon down
> >>>> 256+1(shinfo+gnttab) pages driver first
> >>>> load. Then i did save/restore for 50 times, and live
> >>> migration for 10
> >>>> times. No error occurs.
> >>> Okay, well I still can''t explain why that fixes it,
but
> >>> clearly it does. So
> >>> that''s good. :-)
> >>> 
> >>>  -- Keir
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> 
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Sep-24 20:24 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV / fix for GPLPV drivers

On Wed, Sep 16, 2009 at 07:10:19PM +0800, ANNIE LI
wrote:> 
> >I will do more test to make sure it and update here.
> I tried to map 256 grant frames during initialization and balloon down 
> 256+1(shinfo+gnttab) pages driver first
> load. Then i did save/restore for 50 times, and live migration for 10 
> times. No error occurs.
> 
James: I guess this same fix should be applied to GPLPV drivers? 

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keith Coleman

2009-Oct-27 20:05 UTC

head link

Re: [Xen-devel] Error restoring DomU when using GPLPV / fix for GPLPV drivers

On Thu, Sep 24, 2009 at 4:24 PM, Pasi Kärkkäinen <pasik@iki.fi>
wrote:> On Wed, Sep 16, 2009 at 07:10:19PM +0800, ANNIE LI wrote:
>>
>> >I will do more test to make sure it and update here.
>> I tried to map 256 grant frames during initialization and balloon down
>> 256+1(shinfo+gnttab) pages driver first
>> load. Then i did save/restore for 50 times, and live migration for 10
>> times. No error occurs.
>>
>
> James: I guess this same fix should be applied to GPLPV drivers?
>
The latest GPLPV drivers still can''t restore but the discussion of
this issue has gone quiet. Is this a lost cause?

I''m using xen-3.4.1, official 2.6.18-xen kernel,
gplpv_fre_wnet_x86_0.10.0.130.msi

2:~# xm save win2 win2.save
2:~# xm restore win2.save
Error: /usr/lib64/xen/bin/xc_restore 4 49 2 3 1 1 1 failed
Usage: xm restore <CheckpointFile> [-p]

Restore a domain from a saved state.
  -p, --paused                   Do not unpause domain after restoring it


Keith Coleman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2009 - Error restoring DomU when using GPLPV

[Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV

RE: [Xen-devel] Error restoring DomU when using GPLPV

Re: [Xen-devel] Error restoring DomU when using GPLPV / fix for GPLPV drivers

Re: [Xen-devel] Error restoring DomU when using GPLPV / fix for GPLPV drivers