Andres Lagar-Cavilla
2011-Nov-29 20:32 UTC
[PATCH 0 of 2] Fix correctness race in xc_mem_paging_prep
ging_prep ensures that an mfn is backing the paged-out gfn, and transitions to the next state in the paging state machine for this page. Foreign mappings of the gfn will now succeed. This is the key idea, as it allows the pager to now map the gfn and fill in its contents. Unfortunately, it also allows any other foreign mapper to map the gfn and read its contents. This is particularly dangerous when the populate is launched by a foreign mapper in the first place, which will be actively retrying the map operation and might race with the pager. Qemu-dm being a prime example. Fix the race by allowing a buffer to be optionally passed in the prep operation, and having the hypervisor memcpy from that buffer into the newly prepped page before promoting the gfn type. Second patch is a tools patch, cc''ed maintainers. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/arch/x86/mm/mem_event.c | 2 +- xen/arch/x86/mm/mem_paging.c | 2 +- xen/arch/x86/mm/p2m.c | 52 +++++++++++++++++++++++++++++++++++++++++-- xen/include/asm-x86/p2m.h | 2 +- xen/include/public/domctl.h | 8 +++++- tools/libxc/xc_mem_event.c | 4 +- tools/libxc/xc_mem_paging.c | 23 +++++++++++++++++++ tools/libxc/xenctrl.h | 2 + 8 files changed, 85 insertions(+), 10 deletions(-)
Andres Lagar-Cavilla
2011-Nov-29 21:52 UTC
[PATCH 0 of 2] Fix correctness race in xc_mem_paging_prep
P2m_mem_paging_prep ensures that an mfn is backing the paged-out gfn, and transitions to the next state in the paging state machine for this page. Foreign mappings of the gfn will now succeed. This is the key idea, as it allows the pager to now map the gfn and fill in its contents. Unfortunately, it also allows any other foreign mapper to map the gfn and read its contents. This is particularly dangerous when the populate is launched by a foreign mapper in the first place, which will be actively retrying the map operation and might race with the pager. Qemu-dm being a prime example. Fix the race by allowing a buffer to be optionally passed in the prep operation, and having the hypervisor memcpy from that buffer into the newly prepped page before promoting the gfn type. Second patch is a tools patch, cc''ed maintainers. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/arch/x86/mm/mem_event.c | 2 +- xen/arch/x86/mm/mem_paging.c | 2 +- xen/arch/x86/mm/p2m.c | 52 +++++++++++++++++++++++++++++++++++++++++-- xen/include/asm-x86/p2m.h | 2 +- xen/include/public/domctl.h | 8 +++++- tools/libxc/xc_mem_event.c | 4 +- tools/libxc/xc_mem_paging.c | 23 +++++++++++++++++++ tools/libxc/xenctrl.h | 2 + 8 files changed, 85 insertions(+), 10 deletions(-)
Andres Lagar-Cavilla
2011-Nov-29 21:52 UTC
[PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
xen/arch/x86/mm/mem_event.c | 2 +- xen/arch/x86/mm/mem_paging.c | 2 +- xen/arch/x86/mm/p2m.c | 52 +++++++++++++++++++++++++++++++++++++++++-- xen/include/asm-x86/p2m.h | 2 +- xen/include/public/domctl.h | 8 +++++- 5 files changed, 58 insertions(+), 8 deletions(-) p2m_mem_paging_prep ensures that an mfn is backing the paged-out gfn, and transitions to the next state in the paging state machine for that page. Foreign mappings of the gfn will now succeed. This is the key idea, as it allows the pager to now map the gfn and fill in its contents. Unfortunately, it also allows any other foreign mapper to map the gfn and read its contents. This is particularly dangerous when the populate is launched by a foreign mapper in the first place, which will be actively retrying the map operation and might race with the pager. Qemu-dm being a prime example. Fix the race by allowing a buffer to be optionally passed in the prep operation, and having the hypervisor memcpy from that buffer into the newly prepped page before promoting the gfn type. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -45,7 +45,7 @@ static int mem_event_enable(struct domai struct domain *dom_mem_event = current->domain; struct vcpu *v = current; unsigned long ring_addr = mec->ring_addr; - unsigned long shared_addr = mec->shared_addr; + unsigned long shared_addr = mec->u.shared_addr; l1_pgentry_t l1e; unsigned long shared_gfn = 0, ring_gfn = 0; /* gcc ... */ p2m_type_t p2mt; diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/arch/x86/mm/mem_paging.c --- a/xen/arch/x86/mm/mem_paging.c +++ b/xen/arch/x86/mm/mem_paging.c @@ -47,7 +47,7 @@ int mem_paging_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_PAGING_PREP: { unsigned long gfn = mec->gfn; - return p2m_mem_paging_prep(d, gfn); + return p2m_mem_paging_prep(d, gfn, mec->u.buffer); } break; diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -974,14 +974,43 @@ void p2m_mem_paging_populate(struct doma * mfn if populate was called for gfn which was nominated but not evicted. In * this case only the p2mt needs to be forwarded. */ -int p2m_mem_paging_prep(struct domain *d, unsigned long gfn) +int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer) { struct page_info *page; p2m_type_t p2mt; p2m_access_t a; - mfn_t mfn; + mfn_t mfn, buf_mfn = _mfn(INVALID_MFN); struct p2m_domain *p2m = p2m_get_hostp2m(d); - int ret; + int ret, page_extant = 1; + void *buf_map = NULL; + + /* Map buffer page, if any, and get a reference */ + if ( buffer ) + { + l1_pgentry_t l1e; + unsigned long buf_gfn; + p2m_type_t buf_p2mt; + + if ( (buffer & (PAGE_SIZE - 1)) || + (!access_ok(buffer, PAGE_SIZE)) ) + return -EINVAL; + + guest_get_eff_l1e(current, buffer, &l1e); + buf_gfn = l1e_get_pfn(l1e); + buf_mfn = get_gfn(current->domain, buf_gfn, + &buf_p2mt); + + if ( likely( mfn_valid(buf_mfn) && + p2m_is_ram(buf_p2mt) ) ) + { + get_page(mfn_to_page(buf_mfn), current->domain); + buf_map = map_domain_page(mfn_x(buf_mfn)); + put_gfn(current->domain, buf_gfn); + } else { + put_gfn(current->domain, buf_gfn); + return -EINVAL; + } + } p2m_lock(p2m); @@ -1001,6 +1030,18 @@ int p2m_mem_paging_prep(struct domain *d if ( unlikely(page == NULL) ) goto out; mfn = page_to_mfn(page); + page_extant = 0; + } + + /* If we were given a buffer, now is the time to use it */ + if ( !page_extant && buffer ) + { + void *guest_map; + + ASSERT( mfn_valid(mfn) ); + guest_map = map_domain_page(mfn_x(mfn)); + memcpy(guest_map, buf_map, PAGE_SIZE); + unmap_domain_page(guest_map); } /* Fix p2m mapping */ @@ -1012,6 +1053,11 @@ int p2m_mem_paging_prep(struct domain *d out: p2m_unlock(p2m); + if ( buffer ) + { + unmap_domain_page(buf_map); + put_page(mfn_to_page(buf_mfn)); + } return ret; } diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -479,7 +479,7 @@ void p2m_mem_paging_drop_page(struct dom /* Start populating a paged out frame */ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn); /* Prepare the p2m for paging a frame in */ -int p2m_mem_paging_prep(struct domain *d, unsigned long gfn); +int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer); /* Resume normal operation (in case a domain was paused) */ void p2m_mem_paging_resume(struct domain *d); #else diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -742,8 +742,12 @@ struct xen_domctl_mem_event_op { uint32_t op; /* XEN_DOMCTL_MEM_EVENT_OP_*_* */ uint32_t mode; /* XEN_DOMCTL_MEM_EVENT_OP_* */ - /* OP_ENABLE */ - uint64_aligned_t shared_addr; /* IN: Virtual address of shared page */ + union { + /* OP_ENABLE IN: Virtual address of shared page */ + uint64_aligned_t shared_addr; + /* PAGING_PREP IN: buffer to immediately fill page in */ + uint64_aligned_t buffer; + } u; uint64_aligned_t ring_addr; /* IN: Virtual address of ring page */ /* Other OPs */
Andres Lagar-Cavilla
2011-Nov-29 21:52 UTC
[PATCH 2 of 2] Tools: Libxc wrappers to automatically fill in page oud page contents on prepare
tools/libxc/xc_mem_event.c | 4 ++-- tools/libxc/xc_mem_paging.c | 23 +++++++++++++++++++++++ tools/libxc/xenctrl.h | 2 ++ 3 files changed, 27 insertions(+), 2 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 0ce71e5bfaac -r 15da109b0c7d tools/libxc/xc_mem_event.c --- a/tools/libxc/xc_mem_event.c +++ b/tools/libxc/xc_mem_event.c @@ -24,7 +24,7 @@ #include "xc_private.h" int xc_mem_event_control(xc_interface *xch, domid_t domain_id, unsigned int op, - unsigned int mode, void *shared_page, + unsigned int mode, void *page, void *ring_page, unsigned long gfn) { DECLARE_DOMCTL; @@ -34,7 +34,7 @@ int xc_mem_event_control(xc_interface *x domctl.u.mem_event_op.op = op; domctl.u.mem_event_op.mode = mode; - domctl.u.mem_event_op.shared_addr = (unsigned long)shared_page; + domctl.u.mem_event_op.u.shared_addr = (unsigned long)page; domctl.u.mem_event_op.ring_addr = (unsigned long)ring_page; domctl.u.mem_event_op.gfn = gfn; diff -r 0ce71e5bfaac -r 15da109b0c7d tools/libxc/xc_mem_paging.c --- a/tools/libxc/xc_mem_paging.c +++ b/tools/libxc/xc_mem_paging.c @@ -65,6 +65,29 @@ int xc_mem_paging_prep(xc_interface *xch NULL, NULL, gfn); } +int xc_mem_paging_load(xc_interface *xch, domid_t domain_id, + unsigned long gfn, void *buffer) +{ + int rc; + + if ( !buffer ) + return -EINVAL; + + if ( ((unsigned long) buffer) & (XC_PAGE_SIZE - 1) ) + return -EINVAL; + + if ( mlock(buffer, XC_PAGE_SIZE) ) + return -errno; + + rc = xc_mem_event_control(xch, domain_id, + XEN_DOMCTL_MEM_EVENT_OP_PAGING_PREP, + XEN_DOMCTL_MEM_EVENT_OP_PAGING, + buffer, NULL, gfn); + + (void)munlock(buffer, XC_PAGE_SIZE); + return rc; +} + int xc_mem_paging_resume(xc_interface *xch, domid_t domain_id, unsigned long gfn) { return xc_mem_event_control(xch, domain_id, diff -r 0ce71e5bfaac -r 15da109b0c7d tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1866,6 +1866,8 @@ int xc_mem_paging_nominate(xc_interface unsigned long gfn); int xc_mem_paging_evict(xc_interface *xch, domid_t domain_id, unsigned long gfn); int xc_mem_paging_prep(xc_interface *xch, domid_t domain_id, unsigned long gfn); +int xc_mem_paging_load(xc_interface *xch, domid_t domain_id, + unsigned long gfn, void *buffer); int xc_mem_paging_resume(xc_interface *xch, domid_t domain_id, unsigned long gfn);
Olaf Hering
2011-Nov-30 13:21 UTC
Re: [PATCH 0 of 2] Fix correctness race in xc_mem_paging_prep
On Tue, Nov 29, Andres Lagar-Cavilla wrote:> P2m_mem_paging_prep ensures that an mfn is backing the paged-out gfn, and > transitions to the next state in the paging state machine for this page. > Foreign mappings of the gfn will now succeed. This is the key idea, as it > allows the pager to now map the gfn and fill in its contents. > > Unfortunately, it also allows any other foreign mapper to map the gfn and read > its contents. This is particularly dangerous when the populate is launched > by a foreign mapper in the first place, which will be actively retrying the > map operation and might race with the pager. Qemu-dm being a prime example.Yes, I think thats a real issue. The concept looks ok to me. Olaf
Ian Jackson
2011-Nov-30 14:46 UTC
Re: [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
Andres Lagar-Cavilla writes ("[PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents"):> - /* OP_ENABLE */ > - uint64_aligned_t shared_addr; /* IN: Virtual address of shared page */ > + union { > + /* OP_ENABLE IN: Virtual address of shared page */ > + uint64_aligned_t shared_addr; > + /* PAGING_PREP IN: buffer to immediately fill page in */ > + uint64_aligned_t buffer; > + } u;Do we care that this interface is very binary-incompatible ? Is there a flag or version somewhere where we can at least arrange for this to be detected ? Perhaps we should allocate a new domctl number for this version, so old code gets "no idea what you''re talking about" rather than wrong behaviour ? Ian.
Andres Lagar-Cavilla
2011-Nov-30 15:13 UTC
Re: [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
> Andres Lagar-Cavilla writes ("[PATCH 1 of 2] After preparing a page for > page-in, allow immediate fill-in of the page contents"): >> - /* OP_ENABLE */ >> - uint64_aligned_t shared_addr; /* IN: Virtual address of shared >> page */ >> + union { >> + /* OP_ENABLE IN: Virtual address of shared page */ >> + uint64_aligned_t shared_addr; >> + /* PAGING_PREP IN: buffer to immediately fill page in */ >> + uint64_aligned_t buffer; >> + } u; > > Do we care that this interface is very binary-incompatible ? Is there > a flag or version somewhere where we can at least arrange for this to > be detected ? Perhaps we should allocate a new domctl number for this > version, so old code gets "no idea what you''re talking about" rather > than wrong behaviour ?I turned the field into a union of the same size, so it should be binary compatible. Should... There is no reason to use a union other than clarity: "this field is used for different purposes in different domctls". I think this is fine, but your call. Andres> > Ian. >
Olaf Hering
2011-Dec-01 12:43 UTC
Re: [PATCH 0 of 2] Fix correctness race in xc_mem_paging_prep
On Wed, Nov 30, Olaf Hering wrote:> On Tue, Nov 29, Andres Lagar-Cavilla wrote: > > > P2m_mem_paging_prep ensures that an mfn is backing the paged-out gfn, and > > transitions to the next state in the paging state machine for this page. > > Foreign mappings of the gfn will now succeed. This is the key idea, as it > > allows the pager to now map the gfn and fill in its contents. > > > > Unfortunately, it also allows any other foreign mapper to map the gfn and read > > its contents. This is particularly dangerous when the populate is launched > > by a foreign mapper in the first place, which will be actively retrying the > > map operation and might race with the pager. Qemu-dm being a prime example. > > Yes, I think thats a real issue. The concept looks ok to me.After some more thought I think we can kill two birds with one stone: - merge p2m_mem_paging_prep() into p2m_mem_paging_resume(). p2m_mem_paging_populate() maintains a list of buffer pages and passes one of them to the pager. The pager fills the buffer and passes it back to p2m_mem_paging_resume() which copies that buffer into a newly allocated page. Once the p2mt state is restored the buffer is released for further uses in p2m_mem_paging_populate(). Its just the question: how to handle allocation failures? - if both functions are merged, the communication between p2m_mem_paging_drop()/p2m_mem_paging_populate() and p2m_mem_paging_resume() could be done entirely with th event channel. The two domctls can disappear and also a p2mt could be removed. So both page-out and page-in will be a two-step process. What do you think about that idea? Olaf
Tim Deegan
2011-Dec-01 15:53 UTC
Re: [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
Hi, This looks good to me. I think it needs two more things to make it correct (as well as the tools patch 2/2): - an update to the xenpaging tool to use the new interface; and - a possible update to the paging state machine --- after all, if the prep call allocates the pageand fills its contents, do we need any more stages on the page-in path? One more comment below: At 16:52 -0500 on 29 Nov (1322585560), Andres Lagar-Cavilla wrote:> diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -974,14 +974,43 @@ void p2m_mem_paging_populate(struct doma > * mfn if populate was called for gfn which was nominated but not evicted. In > * this case only the p2mt needs to be forwarded. > */ > -int p2m_mem_paging_prep(struct domain *d, unsigned long gfn) > +int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer) > { > struct page_info *page; > p2m_type_t p2mt; > p2m_access_t a; > - mfn_t mfn; > + mfn_t mfn, buf_mfn = _mfn(INVALID_MFN); > struct p2m_domain *p2m = p2m_get_hostp2m(d); > - int ret; > + int ret, page_extant = 1; > + void *buf_map = NULL; > + > + /* Map buffer page, if any, and get a reference */ > + if ( buffer ) > + { > + l1_pgentry_t l1e; > + unsigned long buf_gfn; > + p2m_type_t buf_p2mt; > + > + if ( (buffer & (PAGE_SIZE - 1)) || > + (!access_ok(buffer, PAGE_SIZE)) ) > + return -EINVAL; > + > + guest_get_eff_l1e(current, buffer, &l1e); > + buf_gfn = l1e_get_pfn(l1e); > + buf_mfn = get_gfn(current->domain, buf_gfn, > + &buf_p2mt); > + > + if ( likely( mfn_valid(buf_mfn) && > + p2m_is_ram(buf_p2mt) ) ) > + { > + get_page(mfn_to_page(buf_mfn), current->domain); > + buf_map = map_domain_page(mfn_x(buf_mfn)); > + put_gfn(current->domain, buf_gfn); > + } else { > + put_gfn(current->domain, buf_gfn); > + return -EINVAL; > + } > + }We could maybe avoid all this mechanism by doing a copy_from_user() of the buffer contents directly into the new page, instead of an explicit map-and-memcpy(). Cheers, Tim.
Andres Lagar-Cavilla
2011-Dec-01 15:58 UTC
Re: [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
> Hi, > > This looks good to me. I think it needs two more things to make it > correct (as well as the tools patch 2/2): > - an update to the xenpaging tool to use the new interface; andSure, have it ready, will definitely cc Olaf for his ack on that one.> - a possible update to the paging state machine --- after all, if the > prep call allocates the pageand fills its contents, do we need > any more stages on the page-in path?I am kind of torn about this. Maybe the pager wants to do a set of loads, and then fire off many vcpu unpauses in a batched fashion (which is possible with patches I submitted later). This isn''t a necessity for correctness, though. And we still need the resume kick for cases like a guest accessing a page that has not been paged out yet (p2m_ram_paging_out) Andres> > One more comment below: > > At 16:52 -0500 on 29 Nov (1322585560), Andres Lagar-Cavilla wrote: >> diff -r 4ee6d40edc2c -r 0ce71e5bfaac xen/arch/x86/mm/p2m.c >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -974,14 +974,43 @@ void p2m_mem_paging_populate(struct doma >> * mfn if populate was called for gfn which was nominated but not >> evicted. In >> * this case only the p2mt needs to be forwarded. >> */ >> -int p2m_mem_paging_prep(struct domain *d, unsigned long gfn) >> +int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t >> buffer) >> { >> struct page_info *page; >> p2m_type_t p2mt; >> p2m_access_t a; >> - mfn_t mfn; >> + mfn_t mfn, buf_mfn = _mfn(INVALID_MFN); >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> - int ret; >> + int ret, page_extant = 1; >> + void *buf_map = NULL; >> + >> + /* Map buffer page, if any, and get a reference */ >> + if ( buffer ) >> + { >> + l1_pgentry_t l1e; >> + unsigned long buf_gfn; >> + p2m_type_t buf_p2mt; >> + >> + if ( (buffer & (PAGE_SIZE - 1)) || >> + (!access_ok(buffer, PAGE_SIZE)) ) >> + return -EINVAL; >> + >> + guest_get_eff_l1e(current, buffer, &l1e); >> + buf_gfn = l1e_get_pfn(l1e); >> + buf_mfn = get_gfn(current->domain, buf_gfn, >> + &buf_p2mt); >> + >> + if ( likely( mfn_valid(buf_mfn) && >> + p2m_is_ram(buf_p2mt) ) ) >> + { >> + get_page(mfn_to_page(buf_mfn), current->domain); >> + buf_map = map_domain_page(mfn_x(buf_mfn)); >> + put_gfn(current->domain, buf_gfn); >> + } else { >> + put_gfn(current->domain, buf_gfn); >> + return -EINVAL; >> + } >> + } > > We could maybe avoid all this mechanism by doing a copy_from_user() of > the buffer contents directly into the new page, instead of an explicit > map-and-memcpy(). > > Cheers, > > Tim. >
Ian Jackson
2011-Dec-01 16:11 UTC
Re: [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents
Andres Lagar-Cavilla writes ("Re: [Xen-devel] [PATCH 1 of 2] After preparing a page for page-in, allow immediate fill-in of the page contents"):> I turned the field into a union of the same size, so it should be binary > compatible. Should...So you did. Yesterday I thought it was a struct. Sorry for being confused. Ian.
Maybe Matching Threads
- [PATCH 0 of 2] v2: memshare/xenpaging/xen-access fixes for xen-unstable
- [PATCH 0 of 2] xenpaging:speed up page-in
- [PATCH] pm : provide CC7/PC2 residency
- [PATCH]Add free memory size of every NUMA node in phsical info
- [PATCH] xen: arm: document which hypercalls (and subops) are supported on ARM