Tim Deegan
2009-May-13 16:07 UTC
[Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly on foreign pagetables.
Hypercalls from dom0 can end up doing resyncs on HVM guests'' out-of-sync shadow pagetables. At that point the check against current->domain in get_page_from_l1e() triggers the typecount exemption for foreign mappings and a writeable typecount gets lost. Make the foreign-domain check explicit by having get_page_from_l1e_for(), which understands both the dom whose right are being used and the dom whose pagetables are being updated. Most callers of get_page_from_l1e() have both the same (instead of one hard-coded to current->domain as before). Analysis and fix from David Lively. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-May-14 07:42 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
>>> Tim Deegan <Tim.Deegan@citrix.com> 13.05.09 18:07 >>> > >Hypercalls from dom0 can end up doing resyncs on HVM guests'' out-of-sync >shadow pagetables. At that point the check against current->domain in >get_page_from_l1e() triggers the typecount exemption for foreign mappings >and a writeable typecount gets lost. > >Make the foreign-domain check explicit by having get_page_from_l1e_for(), >which understands both the dom whose right are being used and the dom >whose pagetables are being updated. Most callers of get_page_from_l1e() >have both the same (instead of one hard-coded to current->domain as before). > >Analysis and fix from David Lively.I have to admit that the change to mod_l1_entry() look suspicious to me - as I understand it, the third parameter of get_page_from_l1e_for() represents the target domain, and that''s what FOREIGNDOM is to be used for. Perhaps the whole thing gets more convoluted because of c/s 19383, which added a vcpu parameter for no apparent reason (current is used for that everywhere afaict). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-May-14 08:22 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
At 08:42 +0100 on 14 May (1242290576), Jan Beulich wrote:> >>> Tim Deegan <Tim.Deegan@citrix.com> 13.05.09 18:07 >>> > > > >Hypercalls from dom0 can end up doing resyncs on HVM guests'' out-of-sync > >shadow pagetables. At that point the check against current->domain in > >get_page_from_l1e() triggers the typecount exemption for foreign mappings > >and a writeable typecount gets lost. > > > >Make the foreign-domain check explicit by having get_page_from_l1e_for(), > >which understands both the dom whose right are being used and the dom > >whose pagetables are being updated. Most callers of get_page_from_l1e() > >have both the same (instead of one hard-coded to current->domain as before). > > > >Analysis and fix from David Lively. > > I have to admit that the change to mod_l1_entry() look suspicious to me - > as I understand it, the third parameter of get_page_from_l1e_for() represents > the target domain, and that''s what FOREIGNDOM is to be used for.Possibly. IIUC get_page_from_l1e_for()''s first domain argument is the domain whose rights we are testing; so e.g. dom0 mapping domU memory uses FOREIGNDOM there to say "this should be domU''s page". The second argument (whose pagetables are these) has always implicitly been "mine", i.e. current->domain. Again correct when dom0 maps domU''s page. In the case we''re trying to fix, although current->domain is dom0 (who is making a shadow control hypercall) the pagetables belong to domU. In the mod_l1_entry() call, get_page_from_l1e_for(nl1e, FOREIGNDOM, d) just gets us the old behaviour (since d == vcpu->domain).> Perhaps the whole thing gets more convoluted because of c/s 19383, which > added a vcpu parameter for no apparent reason (current is used for that > everywhere afaict).That parameter is used so that dom0 can modify a PV domU''s pagetables using the MMU ops (so dom0 tools can offline a suspect page without domU''s help). In that case, I think the current patch actually fixes a latent bug in refcounting foreign mappings from the target domU to a third (HVM) domain, that was missed in cs 19383. This is all pretty confusing, though, and Keir may correct me on any or all of it. :) Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-May-14 08:50 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctlyonforeign pagetables.
>>> Tim Deegan <Tim.Deegan@citrix.com> 14.05.09 10:22 >>> >At 08:42 +0100 on 14 May (1242290576), Jan Beulich wrote: >> Perhaps the whole thing gets more convoluted because of c/s 19383, which >> added a vcpu parameter for no apparent reason (current is used for that >> everywhere afaict). > >That parameter is used so that dom0 can modify a PV domU''s pagetables >using the MMU ops (so dom0 tools can offline a suspect page without >domU''s help).Hmm, now that you say that I do recall. Nevertheless, I can''t spot any case where mod_lX_table() would be called with other than current as the vcpu argument. Is that perhaps only a future plan (and then c/s 19383 was maybe committed prematurely)? Or what am I overlooking? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-May-14 09:17 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly on foreign pagetables.
On 13/05/2009 17:07, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> Hypercalls from dom0 can end up doing resyncs on HVM guests'' out-of-sync > shadow pagetables. At that point the check against current->domain in > get_page_from_l1e() triggers the typecount exemption for foreign mappings > and a writeable typecount gets lost. > > Make the foreign-domain check explicit by having get_page_from_l1e_for(), > which understands both the dom whose right are being used and the dom > whose pagetables are being updated. Most callers of get_page_from_l1e() > have both the same (instead of one hard-coded to current->domain as before).The patch doesn''t apply to unstable/3.4. Further, the usage of domain pointers in get_page_from_l1e_for() is now pretty insane -- we have ''d'', ''target'', ''owner'', and still a couple of references to ''current->domain''. Can we rename some of these, and/or remove redundant variables, and/or add a function comment explaining what is going on therein, and be really sure this won''t bite us if we check it in so close to 3.4.0? I''m certainly nervous about it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-May-14 09:23 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
On 14/05/2009 09:22, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:>> Perhaps the whole thing gets more convoluted because of c/s 19383, which >> added a vcpu parameter for no apparent reason (current is used for that >> everywhere afaict). > > That parameter is used so that dom0 can modify a PV domU''s pagetables > using the MMU ops (so dom0 tools can offline a suspect page without > domU''s help). In that case, I think the current patch actually fixes a > latent bug in refcounting foreign mappings from the target domU to a > third (HVM) domain, that was missed in cs 19383. > > This is all pretty confusing, though, and Keir may correct me on any or > all of it. :)I wish I could. I need a damp towel wrapped round my head to understand the details of that code now. We may need to step back and refactor and rethink after 3.4. Hence my reluctance to mess with it now, as I fear introducing bugs more significant than latent stubdom issues. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2009-May-14 09:39 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
At 10:23 +0100 on 14 May (1242296611), Keir Fraser wrote:> > This is all pretty confusing, though, and Keir may correct me on any or > > all of it. :) > > I wish I could. I need a damp towel wrapped round my head to understand the > details of that code now.You''re not alone. Every time I look at that I need to start from scratch. It doesn''t help that the put_page side also takes a domain pointer but has different semantics.> We may need to step back and refactor and rethink > after 3.4. Hence my reluctance to mess with it now, as I fear introducing > bugs more significant than latent stubdom issues.I think it''d be OK to leave this patch out if you''re worried about it, especially if you''re going to refactor this after 3.4. The crash it fixes hasn''t been seen by anyone else, David has a heavily modified system, and the bug''s been there since out-of-sync pagetables came in. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-May-14 09:57 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
On 14/05/2009 09:22, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:>> I have to admit that the change to mod_l1_entry() look suspicious to me - >> as I understand it, the third parameter of get_page_from_l1e_for() represents >> the target domain, and that''s what FOREIGNDOM is to be used for. > > Possibly. IIUC get_page_from_l1e_for()''s first domain argument is the > domain whose rights we are testing; so e.g. dom0 mapping domU memory > uses FOREIGNDOM there to say "this should be domU''s page". The second > argument (whose pagetables are these) has always implicitly been "mine", > i.e. current->domain. Again correct when dom0 maps domU''s page. > > In the case we''re trying to fix, although current->domain is dom0 (who > is making a shadow control hypercall) the pagetables belong to domU.Yes, there can be three domains involved: the one making the hypercall, the one who owns the PTE, and the one who owns the page being mapped into the PTE. I think some of the confusion around get_page_from_l1e() is that the domain argument is the page-owner not the PTE-owner. It would make sense for it to be the latter, and then as far as possible do the is-the-page-owner-valid checks hidden inside get_page_from_l1e(). The only fly in the ointment there is that FOREIGNDOM should only be permitted from mod_l1_entry(). Possibly that should be the only caller that directly accesses a more complex interface to get_page_from_l1e() (where the extra argument would be the page-owner, not the PTE-owner!). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-May-15 05:00 UTC
RE: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
Keir Fraser wrote:> On 14/05/2009 09:22, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > >>> Perhaps the whole thing gets more convoluted because of c/s 19383, >>> which added a vcpu parameter for no apparent reason (current is >>> used for that everywhere afaict). >> >> That parameter is used so that dom0 can modify a PV domU''s pagetables >> using the MMU ops (so dom0 tools can offline a suspect page without >> domU''s help). In that case, I think the current patch actually >> fixes a latent bug in refcounting foreign mappings from the target >> domU to a third (HVM) domain, that was missed in cs 19383. >> >> This is all pretty confusing, though, and Keir may correct me on any >> or all of it. :) > > I wish I could. I need a damp towel wrapped round my head to > understand the details of that code now. We may need to step back and > refactor and rethink > after 3.4. Hence my reluctance to mess with it now, as I fear > introducing bugs more significant than latent stubdom issues.Sorry for slow response. Yes, that patch is for page ofline as stated by Tim. However, because it is too later to 3.4 release, the whole patchset is not in the tree in the end. I will re-send the patch once the unstable tree is open again, and we can discuss it again at that time. -- jyh> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jun-03 13:42 UTC
Re: [Xen-devel] [PATCH] Make get_page_from_l1e refcount correctly onforeign pagetables.
On 14/05/2009 10:57, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Yes, there can be three domains involved: the one making the hypercall, the > one who owns the PTE, and the one who owns the page being mapped into the > PTE. I think some of the confusion around get_page_from_l1e() is that the > domain argument is the page-owner not the PTE-owner. It would make sense for > it to be the latter, and then as far as possible do the > is-the-page-owner-valid checks hidden inside get_page_from_l1e(). The only > fly in the ointment there is that FOREIGNDOM should only be permitted from > mod_l1_entry(). Possibly that should be the only caller that directly > accesses a more complex interface to get_page_from_l1e() (where the extra > argument would be the page-owner, not the PTE-owner!).Changeset 19708 should fix this satisfactorily, I hope. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel