Jui-Hao Chiang
2010-Dec-03 02:56 UTC
[Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, all: xen/arch/x86/hvm/hvm.c | 2 ++ xen/arch/x86/mm.c | 20 ++++++++++---------- xen/arch/x86/mm/mem_sharing.c | 4 ++-- 3 files changed, 14 insertions(+), 12 deletions(-) This small patch fixes 2 problems of memory sharing for xen-4.0-testing.hg (I haven''t submitted patch here, if it violates any conventional rules, I''m glad to have advices) (1) When nominating a shared page, the page_make_sharable() does not recover the type_info count if it fails to nominate the page. (2) When building xen with debug=n, the code in ASSERT() won''t get executed. Change to BUG_ON. Besides, I don''t understand why the page_make_sharable() force checking the count_info with the following way? */* Check if the ref count is 2. The first from PGT_allocated, and the second * from get_page at the top of this function */ if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))* This seems to imply that the following kind of page can never be nominated for shared pages because ci (count_info) is greater than 2 after get_page. Here, domain 3 is a 64-bit HVM with hap=1, pae=1 on 64bit Xen. *(XEN) Debug for domain=3, gfn=10, Debug page: MFN=c210ad is ci=8000000000000002, ti=0, owner_id=3* Can someone gives a hint that (1) in what kind of scenario that ci = 2 and ti=0? (2) or why not allow ci >=2 to be nominated? Bests, Jui-Hao Chiang at CCMA, ITRI, R.O.C _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Dec-08 10:51 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, At 02:56 +0000 on 03 Dec (1291344990), Jui-Hao Chiang wrote:> This small patch fixes 2 problems of memory sharing for xen-4.0-testing.hg > (I haven''t submitted patch here, if it violates any conventional > rules, I''m glad to have advices)Thanks for your patch! Patches should be based on the tip of xen-unstable; we apply them there and backport to the stable branches. Also, you need to add a "Signed-off-by" line to the patch description to declare that the code is appropriately owned/licensed. See: http://elinux.org/Developer_Certificate_Of_Origin for what that means.> (1) When nominating a shared page, the page_make_sharable() does not > recover the type_info count if it fails to nominate the page.It looks to me as if it works already -- the cmpxchg loop in that function always changes from (type = none, count = 0) to (type = shared, count = 1), so the put_page_and_type() in the failure case does the right thing, putting the count back to 0. I don''t understand why this function requires type == none; CC''ing the author for an explanation.> (2) When building xen with debug=n, the code in ASSERT() won''t get > executed. Change to BUG_ON.This part is clearly correct; I''ve made the equivalent change in xen-unstable as changeset 22467:89116f28083f> Besides, I don''t understand why the page_make_sharable() force checking the count_info with the following way? > /* Check if the ref count is 2. The first from PGT_allocated, and the second > * from get_page at the top of this function */ > if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) > > This seems to imply that the following kind of page can never be nominated for shared pages because ci (count_info) is greater than 2 after get_page. Here, domain 3 is a 64-bit HVM with hap=1, pae=1 on 64bit Xen. > (XEN) Debug for domain=3, gfn=10, Debug page: MFN=c210ad is ci=8000000000000002, ti=0, owner_id=3 > > Can someone gives a hint that > (1) in what kind of scenario that ci = 2 and ti=0? > (2) or why not allow ci >=2 to be nominated?count = 2 and type = 0 happens in exactly the situation that the comment describes: the page has no mappings from anywhere, just the one refcount from being allocated and one taken at the start of the current function. It''s not possible to share a page with typecount > 0 because we need to change its type. I''m not sure why the refcount can''t be greater than two though, but I think it''s to do with how shared pages have their refcounts tracked differently to other pages. Again, maybe Grzegorz can clarify. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jui-Hao Chiang
2010-Dec-09 08:14 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, Tim: Thanks for your information, First I want to explain that our current project is try to implement memory deduplication for unmodified guest in Xen (basically HVM). Since memory sharing code provides good fundamental for COW mechanism, we would like to test and utilize it. Please see my inline comments. On Wed, Dec 8, 2010 at 6:51 PM, Tim Deegan <Tim.Deegan@citrix.com> wrote:> Hi, > > At 02:56 +0000 on 03 Dec (1291344990), Jui-Hao Chiang wrote: > > This small patch fixes 2 problems of memory sharing for > xen-4.0-testing.hg > > (I haven''t submitted patch here, if it violates any conventional > > rules, I''m glad to have advices) > > Thanks for your patch! > > Patches should be based on the tip of xen-unstable; we apply them there > and backport to the stable branches. >You are right, I should change to xen-unstable when submitting the patch. But in the latest xen-unstable, the mem_sharing_share_pages() function crashes the entire xen. Currently I don''t have a serial-port for debugging the oops message. Or could someone give me a hint on how to debug this kind of crash?> > Also, you need to add a "Signed-off-by" line to the patch description to > declare that the code is appropriately owned/licensed. > See: http://elinux.org/Developer_Certificate_Of_Origin for what that > means. >Got it.> > > (1) When nominating a shared page, the page_make_sharable() does not > > recover the type_info count if it fails to nominate the page. > > It looks to me as if it works already -- the cmpxchg loop in that > function always changes from (type = none, count = 0) to (type = shared, > count = 1), so the put_page_and_type() in the failure case does the > right thing, putting the count back to 0. > >It seems the candidate page for nomination usually has (type=none, count=1), and it''s ok for page_make_sharable() to make it (type=none, count=2) afterwards. However, when we have a page (type=none, count=2), the page_make_sharable() will make it wrong as the following steps: (step1) get_page() increases count (type=none, count=3) (step2) cmpxchg loops changes type (type=8400000000000001, count=3, actually the real value of count_info is 0x8000000000000002) (step3) Checking count is greater than 2? Oops!.... abort without recovering type back to none So that''s why I interchange (step2) and (step3) and replace put_page_and_type() with put_page(). I don''t understand why this function requires type == none; CC''ing the> author for an explanation. > > > (2) When building xen with debug=n, the code in ASSERT() won''t get > > executed. Change to BUG_ON. > > This part is clearly correct; I''ve made the equivalent change in > xen-unstable as changeset 22467:89116f28083f > > > Besides, I don''t understand why the page_make_sharable() force checking > the count_info with the following way? > > /* Check if the ref count is 2. The first from PGT_allocated, and the > second > > * from get_page at the top of this function */ > > if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) > > > > This seems to imply that the following kind of page can never be > nominated for shared pages because ci (count_info) is greater than 2 after > get_page. Here, domain 3 is a 64-bit HVM with hap=1, pae=1 on 64bit Xen. > > (XEN) Debug for domain=3, gfn=10, Debug page: MFN=c210ad is > ci=8000000000000002, ti=0, owner_id=3 > > > > Can someone gives a hint that > > (1) in what kind of scenario that ci = 2 and ti=0? > > (2) or why not allow ci >=2 to be nominated? > > count = 2 and type = 0 happens in exactly the situation that the comment > describes: the page has no mappings from anywhere, just the one refcount > from being allocated and one taken at the start of the current function. > >(Correct me if I am wrong please !!)>From my observation, the normal page mapped by a single gfn of a domain willhave count=1 (PGT_allocated) and type=0. The page_make_sharable() will use get_page() before checking ci >=2. So it''s ok for PGT_allocated page, but not ok if ci =2 before get_page(). After some tracing, I found a scenario for the page (ci=2, ti=0). It seems the stub domain for a HVM guest will try to map some of HVM''s memory into its own address space using do_mmu_update(), which increases the ci from 1 to 2 without changing the type or marking it as shared. For a 1GB 64-bit HVM Centos 5.5 guest (pae=1, hap=1), around 250MB will become ci=2 after booting into user space prompt. I wonder the following two things (1) stub domain does this to perform I/O for HVM guest? can someone point out where this code is? (2) is there a way or any place to unmap the memory and make the page count back to 1? It''s not possible to share a page with typecount > 0 because we need to> change its type. I''m not sure why the refcount can''t be greater than > two though, but I think it''s to do with how shared pages have their > refcounts tracked differently to other pages. Again, maybe Grzegorz can > clarify. >Assume my previous guess for stub domain is right. Then if a page from the previous scenario is made sharable and its mapped mfn is freed (when sharing two pages, the later one''s mfn will be discarded), will the stub domain refer to the old discarded mfn if no unmapping is performed?> Cheers, > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Xen Platform Team > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) >Appreciate any comments, Jui-Hao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Dec-09 10:01 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
At 08:14 +0000 on 09 Dec (1291882483), Jui-Hao Chiang wrote:> It seems the candidate page for nomination usually has (type=none, count=1), and it''s ok for page_make_sharable() to make it (type=none, count=2) afterwards. > However, when we have a page (type=none, count=2), the page_make_sharable() will make it wrong as the following steps: > (step1) get_page() increases count (type=none, count=3) > (step2) cmpxchg loops changes type (type=8400000000000001, count=3, actually the real value of count_info is 0x8000000000000002) > (step3) Checking count is greater than 2? Oops!.... abort without recovering type back to noneOK, I think that''s the problem. Pages with type-count 0 basically have no type so page_make_sharable() is wrong to insist on its having type=none; it should only check that the typecount is zero. Can you try changing the test in the cmpxchg loop only to check for ((x & PGT_count_mask) != 0)?> (Correct me if I am wrong please !!) > From my observation, the normal page mapped by a single gfn of a domain will have count=1 (PGT_allocated) and type=0. > The page_make_sharable() will use get_page() before checking ci >=2. > So it''s ok for PGT_allocated page, but not ok if ci =2 before get_page().Yes, that''s what it''s enforcing.> After some tracing, I found a scenario for the page (ci=2, ti=0). > It seems the stub domain for a HVM guest will try to map some of HVM''s memory into its own address space using do_mmu_update(), which increases the ci from 1 to 2 without changing the type or marking it as shared. For a 1GB 64-bit HVM Centos 5.5 guest (pae=1, hap=1), around 250MB will become ci=2 after booting into user space prompt. > > I wonder the following two things > (1) stub domain does this to perform I/O for HVM guest? can someone point out where this code is?Yes; the code is in the qemu sources.> (2) is there a way or any place to unmap the memory and make the page count back to 1?IIRC it''s possible to ask qemu to unmap the guest''s memory but not to synchronously wait for that to happen (because that would be a priority inversion at best and a deadlock at worst). So that doesn''t really help. If you''re trying to do page deduplication you might try putting some of the logic into qemu, where it can make sure it unmaps the page before asking to share it.> Assume my previous guess for stub domain is right. Then if a page > from the previous scenario is made sharable and its mapped mfn is > freed (when sharing two pages, the later one''s mfn will be discarded), > will the stub domain refer to the old discarded mfn if no unmapping is > performed?Yes, and that''s definitely not good. AFAICS, qemu has to drop all its mappings for the sharing to be safe, and the page needs to be unshared again if qemu tries to map it again. Otherwise I/O from one VM could pollute another VM (or in your case, I/O to one page could overwrite another page). Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jui-Hao Chiang
2010-Dec-10 08:30 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, Tim: On Thu, Dec 9, 2010 at 6:01 PM, Tim Deegan <Tim.Deegan@citrix.com> wrote:> At 08:14 +0000 on 09 Dec (1291882483), Jui-Hao Chiang wrote: > > It seems the candidate page for nomination usually has (type=none, > count=1), and it''s ok for page_make_sharable() to make it (type=none, > count=2) afterwards. > > However, when we have a page (type=none, count=2), the > page_make_sharable() will make it wrong as the following steps: > > (step1) get_page() increases count (type=none, count=3) > > (step2) cmpxchg loops changes type (type=8400000000000001, count=3, > actually the real value of count_info is 0x8000000000000002) > > (step3) Checking count is greater than 2? Oops!.... abort without > recovering type back to none > > OK, I think that''s the problem. Pages with type-count 0 basically have > no type so page_make_sharable() is wrong to insist on its having > type=none; it should only check that the typecount is zero. > > Can you try changing the test in the cmpxchg loop only to check > for ((x & PGT_count_mask) != 0)? > >I just checked the value of PGT_none, and it is actually equal to 0, so there seems nothing wrong with the original checking. The reason that I swap step3 and step2 is because step3 doesn''t modify any value while step2 does. If checking count_info fails, then just abort. Else if count_info is correct, then go checking and modifying type_info.> > (2) is there a way or any place to unmap the memory and make the page > count back to 1? > > IIRC it''s possible to ask qemu to unmap the guest''s memory but not to > synchronously wait for that to happen (because that would be a priority > inversion at best and a deadlock at worst). So that doesn''t really > help. > > If you''re trying to do page deduplication you might try putting some of > the logic into qemu, where it can make sure it unmaps the page before > asking to share it. > >I found one patch of xenpaging as the following, but not sure it''s exactly the same as my question. Cuts from their article http://thread.gmane.org/gmane.comp.emulators.xen.devel/91768/focus=91770 "qemu will just keep mapping pages and not release them, which causes problems for the memory pager (since the page is mapped, it won''t get paged out)" AFAIK, the qemu can not always map the entire address space of HVM guest, so Map Cache feature tends to map HVM''s memory on-demand/partially. The flush-cache command will trigger qemu_invalidate_map_cache(), which in turn calls munmap() on all mapped virtual addresses. Am I on the right track?> > Assume my previous guess for stub domain is right. Then if a page > > from the previous scenario is made sharable and its mapped mfn is > > freed (when sharing two pages, the later one''s mfn will be discarded), > > will the stub domain refer to the old discarded mfn if no unmapping is > > performed? > > Yes, and that''s definitely not good. AFAICS, qemu has to drop all its > mappings for the sharing to be safe, and the page needs to be unshared > again if qemu tries to map it again. Otherwise I/O from one VM could > pollute another VM (or in your case, I/O to one page could overwrite > another page). > >Thanks for your remind "the page needs to be unshared again if qemu tries to map it again". There exists several hypercalls to perform the memory mapping, e.g. mmu_update, update_va_mapping, and I will check on them. Of course, only the RW mapping should be taken care, right? Bests, Jui-Hao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Dec-10 09:57 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, Can you possibly configure your mailer to indent quoted text? Your emails are a bit tricky to read without it. At 08:30 +0000 on 10 Dec (1291969810), Jui-Hao Chiang wrote:> I just checked the value of PGT_none, and it is actually equal to 0, > so there seems nothing wrong with the original checking.What I meant was: the current check looks at the type and the count. AFAICS it ought to be only checking the count and ignoring the type. That is: if((x & (PGT_type_mask | PGT_count_mask)) != PGT_none) should be: if ( (x & PGT_count_mask) != 0 )> The reason that I swap step3 and step2 is because step3 doesn''t modify any value while step2 does. > If checking count_info fails, then just abort. Else if count_info is correct, then go checking and modifying type_info.I think the check needs to be after the type change. Otherwise you could race with another refcount> I found one patch of xenpaging as the following, but not sure it''s exactly the same as my question. > Cuts from their article http://thread.gmane.org/gmane.comp.emulators.xen.devel/91768/focus=91770 > "qemu will just keep mapping pages and not release them, which causes problems for the memory pager (since the page is mapped, it won''t get paged out)" > > AFAIK, the qemu can not always map the entire address space of HVM guest, so Map Cache feature tends to map HVM''s memory on-demand/partially. The flush-cache command will trigger qemu_invalidate_map_cache(), which in turn calls munmap() on all mapped virtual addresses. Am I on the right track?Possibly. Where are you putting the code that scans for duplicate pages?> > Assume my previous guess for stub domain is right. Then if a page > > from the previous scenario is made sharable and its mapped mfn is > > freed (when sharing two pages, the later one''s mfn will be discarded), > > will the stub domain refer to the old discarded mfn if no unmapping is > > performed? > > Yes, and that''s definitely not good. AFAICS, qemu has to drop all its > mappings for the sharing to be safe, and the page needs to be unshared > again if qemu tries to map it again. Otherwise I/O from one VM could > pollute another VM (or in your case, I/O to one page could overwrite > another page). > > > Thanks for your remind "the page needs to be unshared again if qemu tries to map it again". > There exists several hypercalls to perform the memory mapping, e.g. mmu_update, update_va_mapping, and I will check on them. > Of course, only the RW mapping should be taken care, right?No, any foreign mapping needs to be handled, including grant tables, because there''s no guaranteed way to fix them up when the sharing is undone later. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jui-Hao Chiang
2010-Dec-15 09:55 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
Hi, Tim: Sorry for late reply, since I am thinking about some design issues. On Fri, Dec 10, 2010 at 5:57 PM, Tim Deegan <Tim.Deegan@citrix.com> wrote:> Hi, > > Can you possibly configure your mailer to indent quoted text? Your > emails are a bit tricky to read without it. >I am using the gmail, but don''t see any config for this. It should be indented and prefixed by the symbol ''|'', isn''t it?> > At 08:30 +0000 on 10 Dec (1291969810), Jui-Hao Chiang wrote: > > I just checked the value of PGT_none, and it is actually equal to 0, > > so there seems nothing wrong with the original checking. > > What I meant was: the current check looks at the type and the count. > AFAICS it ought to be only checking the count and ignoring the type. > That is: > > if((x & (PGT_type_mask | PGT_count_mask)) != PGT_none) > > should be: > > if ( (x & PGT_count_mask) != 0 ) > > > > The reason that I swap step3 and step2 is because step3 doesn''t modify > any value while step2 does. > > If checking count_info fails, then just abort. Else if count_info is > correct, then go checking and modifying type_info. > > I think the check needs to be after the type change. Otherwise you > could race with another refcount > >I see. So you are saying that type_count is providing some kind of protection on refcount? If that''s the case, I will just add a small piece of code to recover type_info when the check of count_info fails.> > I found one patch of xenpaging as the following, but not sure it''s > exactly the same as my question. > > Cuts from their article > http://thread.gmane.org/gmane.comp.emulators.xen.devel/91768/focus=91770 > > "qemu will just keep mapping pages and not release them, which causes > problems for the memory pager (since the page is mapped, it won''t get paged > out)" > > > > AFAIK, the qemu can not always map the entire address space of HVM guest, > so Map Cache feature tends to map HVM''s memory on-demand/partially. The > flush-cache command will trigger qemu_invalidate_map_cache(), which in turn > calls munmap() on all mapped virtual addresses. Am I on the right track? > > Possibly. Where are you putting the code that scans for duplicate pages? > >First, the patch from the above URL seems to work on the unstable version, which calls qemu_invalidate_map_cache() to drop the count down to 1 for qemu mapped pages. But the patch is not currently in the unstable version. As for where to put the code for duplicate pages, we wonder to do that in workqueue or some other thread-like things. Since this is a periodic running procedure, and definitely takes a long time. Originally we want to put it in dom0 user-level as a daemon, but there are not enough page information there. So it seems better to put inside hypersior. Do you have any other suggestion? One more thing is about the design part. (1) The current memory sharing code uses a unique 64-bit handle number to identify a page/mfn, and use this handle to index into the hash list. (2) The page nomination makes the page type as p2m_ram_shared (read-only), and return the handle to user, e.g. blktap2. (3) Later on, the user calls mem_sharing_share_pages(handle1, handle2) by giving two handles. Note that, the code doesn''t do any comparison of page content but leave this task to users whereas the only user we know is blktap2 with qcow disk format configured. But why not let the memory sharing code to do the content comparison? Is there any user who wants to share two pages with different content? In order to perform the page comparison, we need to compute checksum value for the page content (or even md5, sha1..), and use this checksum value to index into a hash list. It seem feasible to replace the handle number with the checksum value of page, and combine the nominate() and share() operations together in one function as the following steps: (1) Mark the page as p2m_ram_shared (2) Compute the checksum of a page (3) Search in the hash list, if some page with the same checksum value is found, a real byte-by-byte memory compasion is performed. (4) If content is totally matched, perform the share() operation by removing one duplicated page/mfn. (5) Record this checksum value in page_info->shr_handle (used to store the handle) Afterwards, when COW happens, the unshare() operation uses domain_id and gfn to find the page_info, then uses the inner checksum value to search hash list. Everything seems fine, but the tradeoff is to change the blktap2 code for this new interface.> > > Assume my previous guess for stub domain is right. Then if a page > > > from the previous scenario is made sharable and its mapped mfn is > > > freed (when sharing two pages, the later one''s mfn will be discarded), > > > will the stub domain refer to the old discarded mfn if no unmapping is > > > performed? > > > > Yes, and that''s definitely not good. AFAICS, qemu has to drop all its > > mappings for the sharing to be safe, and the page needs to be unshared > > again if qemu tries to map it again. Otherwise I/O from one VM could > > pollute another VM (or in your case, I/O to one page could overwrite > > another page). > > > > > > Thanks for your remind "the page needs to be unshared again if qemu tries > to map it again". > > There exists several hypercalls to perform the memory mapping, e.g. > mmu_update, update_va_mapping, and I will check on them. > > Of course, only the RW mapping should be taken care, right? > > No, any foreign mapping needs to be handled, including grant tables, > because there''s no guaranteed way to fix them up when the sharing is > undone later. >Thanks for the remind! I will make sure they are unmapped properly. Bests, Jui-Hao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Dec-15 12:56 UTC
Re: [Xen-devel] [PATCH][Xen 4.0-testing.hg] fix small bugs of memory sharing
At 09:55 +0000 on 15 Dec (1292406915), Jui-Hao Chiang wrote:> As for where to put the code for duplicate pages, we wonder to do that > in workqueue or some other thread-like things. Since this is a > periodic running procedure, and definitely takes a long > time. Originally we want to put it in dom0 user-level as a daemon, but > there are not enough page information there. So it seems better to put > inside hypersior. Do you have any other suggestion?I would avoid putting it in the hypervisor on general principle. Anything that can live in the tools should live in the tools.> One more thing is about the design part. > (1) The current memory sharing code uses a unique 64-bit handle number to identify a page/mfn, and use this handle to index into the hash list. > (2) The page nomination makes the page type as p2m_ram_shared (read-only), and return the handle to user, e.g. blktap2. > (3) Later on, the user calls mem_sharing_share_pages(handle1, handle2) by giving two handles. Note that, the code doesn''t do any comparison of page content but leave this task to users whereas the only user we know is blktap2 with qcow disk format configured. > But why not let the memory sharing code to do the content comparison? Is there any user who wants to share two pages with different content? >No, but since the userspace tool already knows what it''s doing there''s no need for Xen to compare the pages. It would just make it slower.> In order to perform the page comparison, we need to compute checksum value for the page content (or even md5, sha1..), and use this checksum value to index into a hash list. It seem feasible to replace the handle number with the checksum value of page, and combine the nominate() and share() operations together in one function as the following steps: > (1) Mark the page as p2m_ram_shared > (2) Compute the checksum of a page > (3) Search in the hash list, if some page with the same checksum value is found, a real byte-by-byte memory compasion is performed. > (4) If content is totally matched, perform the share() operation by removing one duplicated page/mfn. > (5) Record this checksum value in page_info->shr_handle (used to store the handle)That''s potentially a very expensive operation to do in a single hypercall. Also, how would you choose which pages to nominate? Just comparing every page with avery other page is O(n^2), and if you have a better scheme (like Grzegorz''s one of checking pages as they''re loaded) then you should use that instead of searching for a match in all of memory. Of course, you could do all the checksums and comparisons from userspace using the existing interfaces, without adding more mechanism in Xen. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel