I have been trying to debug a problem live-migrating SAP on Xen-3.0.3 x86-64 (I also tested Xen-unstable changeset 12548) without success. SAP seems to run fine on a given host; live-migrating it to another host causes the guest to almost immediately panic in the mprotect() call in the change_pte_range() routine in the set_pte_at() macro because the page table page it is trying to update is write-protected. My attempts at understanding where this is coming from have come to naught. Any help in running this down would be appreciated. I am perfectly willing/able to write some debugging code if I am given a few clues what to look for. Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Nov-29 00:22 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
I forgot to mention that a very simple test case I wrote using shared memory and the mprotect call didn''t fail. So, the only test case I have at the moment is to run SAP. John Byrne John Byrne wrote:> > I have been trying to debug a problem live-migrating SAP on Xen-3.0.3 > x86-64 (I also tested Xen-unstable changeset 12548) without success. > > SAP seems to run fine on a given host; live-migrating it to another host > causes the guest to almost immediately panic in the mprotect() call in > the change_pte_range() routine in the set_pte_at() macro because the > page table page it is trying to update is write-protected. > > My attempts at understanding where this is coming from have come to > naught. Any help in running this down would be appreciated. I am > perfectly willing/able to write some debugging code if I am given a few > clues what to look for. > > Thanks, > > John Byrne > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Nov-29 01:36 UTC
RE: [Xen-devel] Live migration leaves page tables read-only?
> I forgot to mention that a very simple test case I wrote using shared > memory and the mprotect call didn''t fail. So, the only test case Ihave> at the moment is to run SAP.What happens if you use non-live relo? Also, can you repro on 32b? Thanks, Ian> John Byrne > > John Byrne wrote: > > > > I have been trying to debug a problem live-migrating SAP onXen-3.0.3> > x86-64 (I also tested Xen-unstable changeset 12548) without success. > > > > SAP seems to run fine on a given host; live-migrating it to anotherhost> > causes the guest to almost immediately panic in the mprotect() callin> > the change_pte_range() routine in the set_pte_at() macro because the > > page table page it is trying to update is write-protected. > > > > My attempts at understanding where this is coming from have come to > > naught. Any help in running this down would be appreciated. I am > > perfectly willing/able to write some debugging code if I am given afew> > clues what to look for. > > > > Thanks, > > > > John Byrne > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Nov-29 02:52 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian Pratt wrote:>> I forgot to mention that a very simple test case I wrote using shared >> memory and the mprotect call didn''t fail. So, the only test case I > have >> at the moment is to run SAP. > > What happens if you use non-live relo?I thought I had tested that way back at the beginning without seeing the problem, but I must not have, because I just retested it to be sure and it died the same way. (Now I am truly confused and I need to go back and re-examine some of my earlier experiments.) In the meantime, any ideas where to look?> > Also, can you repro on 32b?I am doing this on behalf of someone else, so I''d have to ask them to do the setup if they have the time. I am reluctant to do so at this point. Thanks, John> > Thanks, > Ian > > >> John Byrne >> >> John Byrne wrote: >>> I have been trying to debug a problem live-migrating SAP on > Xen-3.0.3 >>> x86-64 (I also tested Xen-unstable changeset 12548) without success. >>> >>> SAP seems to run fine on a given host; live-migrating it to another > host >>> causes the guest to almost immediately panic in the mprotect() call > in >>> the change_pte_range() routine in the set_pte_at() macro because the >>> page table page it is trying to update is write-protected. >>> >>> My attempts at understanding where this is coming from have come to >>> naught. Any help in running this down would be appreciated. I am >>> perfectly willing/able to write some debugging code if I am given a > few >>> clues what to look for. >>> >>> Thanks, >>> >>> John Byrne >>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-29 07:42 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote:>> What happens if you use non-live relo? > > I thought I had tested that way back at the beginning without seeing the > problem, but I must not have, because I just retested it to be sure and > it died the same way. (Now I am truly confused and I need to go back and > re-examine some of my earlier experiments.) > > In the meantime, any ideas where to look?This will be very dependent on the guest that is being migrated. What Linux kernel is the domU running? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Nov-29 16:49 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Keir Fraser wrote:> > > On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote: > >>> What happens if you use non-live relo? >> I thought I had tested that way back at the beginning without seeing the >> problem, but I must not have, because I just retested it to be sure and >> it died the same way. (Now I am truly confused and I need to go back and >> re-examine some of my earlier experiments.) >> >> In the meantime, any ideas where to look? > > This will be very dependent on the guest that is being migrated. What Linux > kernel is the domU running? > > -- Keir > > >Linux 2.6.16.29 (+ the SLES 10 iscsi patches) with approximately the config used by the SLES 10 Xen kernel. I can send the config, if you need it. John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Nov-30 23:36 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
John Byrne wrote:> Ian Pratt wrote: >>> I forgot to mention that a very simple test case I wrote using shared >>> memory and the mprotect call didn''t fail. So, the only test case I >> have >>> at the moment is to run SAP. >> >> What happens if you use non-live relo? > > I thought I had tested that way back at the beginning without seeing the > problem, but I must not have, because I just retested it to be sure and > it died the same way. (Now I am truly confused and I need to go back and > re-examine some of my earlier experiments.) >After redoing some of my tests and understanding more about how Xen handles page tables, I started looking at ptwr_do_page_fault() and put debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing in x86_emulate_memop(). Building a debug version of Xen provided some additional information (the final line is from my debugging, after the ":" is domid, addr, pte, pte flags, type_info, page owner, domain): (XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp 00000000e0000000) for mfn c8de3 (pfn 12491) (XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491) from L1 entry 00000000c8de3167 for dom1 (XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp 00000000e0000000) for mfn c8de3 (pfn 12491) (XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491) from L1 entry 00000000c8de3067 for dom1 (XEN) DOM1: (file=mm.c, line=3120) ptwr_emulate: could not get_page_from_l1e() (XEN) ptwr_do_page_fault,3253:1 ffff880011065bc0 80100000ca20f065 801065 28000001 ffff830000fe7080 ffff830000fe7080 I''ll keep following this down, but any help would be appreciated. Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Dec-01 01:13 UTC
RE: [Xen-devel] Live migration leaves page tables read-only?
> >> What happens if you use non-live relo? > > > > I thought I had tested that way back at the beginning without seeingthe> > problem, but I must not have, because I just retested it to be sureand> > it died the same way. (Now I am truly confused and I need to go backand> > re-examine some of my earlier experiments.) > > > > After redoing some of my tests and understanding more about how Xen > handles page tables, I started looking at ptwr_do_page_fault() and put > debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failingin> x86_emulate_memop(). Building a debug version of Xen provided some > additional information (the final line is from my debugging, after the > ":" is domid, addr, pte, pte flags, type_info, page owner, domain):You say you can repro the problem using non-live relo. In that case, you should also be able to repro it using save/restore, which has almost identical code paths. Please try and isolate whether the crash happens on save or restore, and further whether a given saved images crashes every time in the same way when you try and restore it (mfns will be different, but pfns may be the same). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Dec-09 05:40 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian Pratt wrote:>>>> What happens if you use non-live relo? >>> I thought I had tested that way back at the beginning without seeing > the >>> problem, but I must not have, because I just retested it to be sure > and >>> it died the same way. (Now I am truly confused and I need to go back > and >>> re-examine some of my earlier experiments.) >>> >> After redoing some of my tests and understanding more about how Xen >> handles page tables, I started looking at ptwr_do_page_fault() and put >> debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing > in >> x86_emulate_memop(). Building a debug version of Xen provided some >> additional information (the final line is from my debugging, after the >> ":" is domid, addr, pte, pte flags, type_info, page owner, domain): > > You say you can repro the problem using non-live relo. In that case, you > should also be able to repro it using save/restore, which has almost > identical code paths. > > Please try and isolate whether the crash happens on save or restore, and > further whether a given saved images crashes every time in the same way > when you try and restore it (mfns will be different, but pfns may be the > same). > > > Ian > >I finally ran down the problem. SAP is protecting the pages PROT_NONE, so the page-present bit in the pte is not set and canonicalize/uncanonicalize code in save/restore ignore the pte. I''ve attached a patch. It is possible that this change should be made to the l1e tests in xc_ptrace.c; I''m not sure. John Byrne Signed-off-by: John Byrne <john.l.byrne@hp.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Dec-09 05:44 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
John Byrne wrote:> ...snipped.. > I finally ran down the problem. SAP is protecting the pages PROT_NONE, > so the page-present bit in the pte is not set and > canonicalize/uncanonicalize code in save/restore ignore the pte. I''ve > attached a patch. It is possible that this change should be made to the > l1e tests in xc_ptrace.c; I''m not sure. > > John Byrne >The patch is against xen-unstable changeset 12815:1ad7dff99968. John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Dec-09 08:33 UTC
RE: [Xen-devel] Live migration leaves page tables read-only?
> I finally ran down the problem. SAP is protecting the pages PROT_NONE, > so the page-present bit in the pte is not set and > canonicalize/uncanonicalize code in save/restore ignore the pte. I''ve > attached a patch. It is possible that this change should be made tothe> l1e tests in xc_ptrace.c; I''m not sure.That''s a good catch, thanks. Interesting that we hadn''t seen this before. Although your patch works today, it will break when we add PSE (super page) support for PV guests as it will confuse PROT_NONE with PSE. Assuming PROT_NONE only makes sense for L1 entries, we can probably gate the tests on whether the page table page is an L1 or not to fix this. However, it does point out an issue for other OSes: Taking this patch effectively makes Linux''s PROT_NONE (flags 0x80 for a not present PTE) part of the Xen API. We need to find out whether this is compatible with *BSD and Solaris'' use of flags for not present ptes. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-09 09:22 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
On 9/12/06 8:33 am, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:> Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux''s PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris'' use of flags for not present ptes.If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available for things like swapcache info. Making assumptions about not-present PTEs is not really tenable. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-09 09:34 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
On 9/12/06 9:22 am, "Keir Fraser" <keir@xensource.com> wrote:>> Although your patch works today, it will break when we add PSE (super >> page) support for PV guests as it will confuse PROT_NONE with PSE. >> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate >> the tests on whether the page table page is an L1 or not to fix this. >> >> However, it does point out an issue for other OSes: Taking this patch >> effectively makes Linux''s PROT_NONE (flags 0x80 for a not present PTE) >> part of the Xen API. We need to find out whether this is compatible with >> *BSD and Solaris'' use of flags for not present ptes. > > If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available > for things like swapcache info. Making assumptions about not-present PTEs is > not really tenable.Speaking more constructively we could have a pte_active_mask communicated via elfnote or xenbus (or some other way) which the tools would apply to PTEs to determine if they contain an MFN. Default would be 0x1. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-09 09:48 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
On 9/12/06 9:34 am, "Keir Fraser" <keir@xensource.com> wrote:>> If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available >> for things like swapcache info. Making assumptions about not-present PTEs is >> not really tenable. > > Speaking more constructively we could have a pte_active_mask communicated > via elfnote or xenbus (or some other way) which the tools would apply to > PTEs to determine if they contain an MFN. Default would be 0x1.Or we could apply the special case only for images with the OS elfnote set to ''linux'', if all Linux kernels have the same PROT_NONE definition. With any of these solutions, the problem is how to communicate the flag or mask to xc_linux_save/xc_linux_restore, and how to propagate it across save/restore (i.e., how is it represented in a saved image?). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Joe Bonasera
2006-Dec-11 17:00 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian Pratt wrote:> >> I finally ran down the problem. SAP is protecting the pages PROT_NONE, >> so the page-present bit in the pte is not set and >> canonicalize/uncanonicalize code in save/restore ignore the pte. I''ve >> attached a patch. It is possible that this change should be made to > the >> l1e tests in xc_ptrace.c; I''m not sure. > > That''s a good catch, thanks. Interesting that we hadn''t seen this > before. > > Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux''s PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris'' use of flags for not present ptes. > > IanSolaris implements PROT_NONE by entirely invalidating the PTE (ie. it becomes zero). Hence our PTEs always had either zero or have the PRESENT bit set. The only exception to this was adding some fixage to allow for the old Xen writable page table approach which temporarily made the upper table non-PRESENT. So you can make not-present, but non-zero entries mean anything you want. As long as it''s the guest OS that creates the entries, we''ll just not do it. Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Dec-11 18:29 UTC
RE: [Xen-devel] Live migration leaves page tables read-only?
> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it > becomes zero). Hence our PTEs always had either zero or have thePRESENT> bit set. The only exception to this was adding some fixage to allow > for the old Xen writable page table approach which temporarily made > the upper table non-PRESENT. > > So you can make not-present, but non-zero entries mean anything youwant.> As long as it''s the guest OS that creates the entries, we''ll just notdo> it.Just to be confirm: in Solaris there are no not-present PTE''s that contain machine addresses. This means we need to implement the scheme that Keir suggested to enable the guest OS to tell xen/xc_save/restore about flags in not-present PTEs that should trigger a m2p conversion. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2006-Dec-11 19:55 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian Pratt wrote:> >> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it >> becomes zero). Hence our PTEs always had either zero or have the > PRESENT >> bit set. The only exception to this was adding some fixage to allow >> for the old Xen writable page table approach which temporarily made >> the upper table non-PRESENT. >> >> So you can make not-present, but non-zero entries mean anything you > want. >> As long as it''s the guest OS that creates the entries, we''ll just not > do >> it. > > Just to be confirm: in Solaris there are no not-present PTE''s that > contain machine addresses. > > This means we need to implement the scheme that Keir suggested to enable > the guest OS to tell xen/xc_save/restore about flags in not-present PTEs > that should trigger a m2p conversion. > > Ian >Ian, Silly me. I thought "xc_linux_save" meant what it said. I haven''t paid much attention to BSD or Solaris on Xen and didn''t realize that went through the same path. I''d really like to see this fixed for 3.0.4, at least for Linux, but I don''t think I''m the person to implement a new "scheme" quickly to do it, but I''ll try if someone wants to give me some advice on how to start. On the subject of schemes, what about support for other architectures? Is there anything we should be thinking about for supporting guests with different page sizes, for instance? John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Joe Bonasera
2006-Dec-11 21:30 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian Pratt wrote:> >> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it >> becomes zero). Hence our PTEs always had either zero or have the > PRESENT >> bit set. The only exception to this was adding some fixage to allow >> for the old Xen writable page table approach which temporarily made >> the upper table non-PRESENT. >> >> So you can make not-present, but non-zero entries mean anything you > want. >> As long as it''s the guest OS that creates the entries, we''ll just not > do >> it. > > Just to be confirm: in Solaris there are no not-present PTE''s that > contain machine addresses.yes> This means we need to implement the scheme that Keir suggested to enable > the guest OS to tell xen/xc_save/restore about flags in not-present PTEs > that should trigger a m2p conversion. > > Ian_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2007-Jan-14 04:11 UTC
Re: [Xen-devel] Live migration leaves page tables read-only?
Ian, I haven''t noticed a fix. Is someone working on this bug or should I open a bugzilla for it, so it isn''t forgotten? John Byrne Ian Pratt wrote:> >> I finally ran down the problem. SAP is protecting the pages PROT_NONE, >> so the page-present bit in the pte is not set and >> canonicalize/uncanonicalize code in save/restore ignore the pte. I''ve >> attached a patch. It is possible that this change should be made to > the >> l1e tests in xc_ptrace.c; I''m not sure. > > That''s a good catch, thanks. Interesting that we hadn''t seen this > before. > > Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux''s PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris'' use of flags for not present ptes. > > Ian >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Jan-14 08:21 UTC
RE: [Xen-devel] Live migration leaves page tables read-only?
> I haven''t noticed a fix. Is someone working on this bug or should Iopen> a bugzilla for it, so it isn''t forgotten?It''s not forgotten, but I''m not aware of anyone actively working on it. It''s a bit fiddly to fix properly. We need to add an elf note that describes how to identify not-present PTEs that contain MFNs. For linux this is easy as testing for the presence of a single bit being set works. In principle, you might need a more complex scheme, but I''m not aware of any OSes that actually require this. Allowing a mask and value to be specified would be good, something that could be extended into a list of mask:value,mask:value in future if need be e.g.: np_pte_contains_mfn_flags=c0:80,c0:40 The elf note would need to be pulled out of kernel by the domain builder, and then we need to figure out how to make the info available to the save/restore code. Would be good if someone could pick this up. Thanks, Ian> Ian Pratt wrote: > > > >> I finally ran down the problem. SAP is protecting the pagesPROT_NONE,> >> so the page-present bit in the pte is not set and > >> canonicalize/uncanonicalize code in save/restore ignore the pte.I''ve> >> attached a patch. It is possible that this change should be made to > > the > >> l1e tests in xc_ptrace.c; I''m not sure. > > > > That''s a good catch, thanks. Interesting that we hadn''t seen this > > before. > > > > Although your patch works today, it will break when we add PSE(super> > page) support for PV guests as it will confuse PROT_NONE with PSE. > > Assuming PROT_NONE only makes sense for L1 entries, we can probablygate> > the tests on whether the page table page is an L1 or not to fixthis.> > > > However, it does point out an issue for other OSes: Taking thispatch> > effectively makes Linux''s PROT_NONE (flags 0x80 for a not presentPTE)> > part of the Xen API. We need to find out whether this is compatiblewith> > *BSD and Solaris'' use of flags for not present ptes. > > > > Ian > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel