Jiang, Yunhong
2005-Sep-19 05:54 UTC
RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly
Jiang, Yunhong wrote:> I think the Xin Xiaohui''s patch resolved most problem for the page > left issue. > > Also another small issue found is the on vmx_set_cr0. On Windows, the > guest will enter/leave protected mode many times. Following code > cause problem. > if ((value & X86_CR0_PE) && (value & X86_CR0_PG) && > !paging_enabled) { > /* > * The guest CR3 must be pointing to the guest physical. > */ > if ( !VALID_MFN(mfn = get_mfn_from_pfn( > d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) || > !get_page(pfn_to_page(mfn), d->domain) ) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > { > printk("Invalid CR3 value = %lx", > d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /* > need to take a clean path */ } > > We should place the get_page to when guest set cr3 when not paging, > i.e >> > case 3: > { > unsigned long old_base_mfn, mfn; > > /* > * If paging is not enabled yet, simply copy the value to CR3. > */ > if (!vmx_paging_enabled(d)) { > ..... get page here.... >More exactly should be: if (!vmx_paging_enabled(d)) { get_page for new mfn put_page for old mfn } Thanks Yunhong Jiang> After above two change, windows destroied succesfully after > create/login/open IE/destroy. > > Thanks > Yunhong Jiang > > Khoa Huynh wrote: >> Keir Fraser wrote: >>> I mean forcibly decrement them to zero and free them right there and >>> then. Of course, as you point out, the problem is that some of the >>> pages are mapped in domain0. I''m not sure how we can distinguish >>> tainted refcnts from genuine external references. Perhaps there''s a >>> proper way we should be destructing the full shadow pagetables such >>> that the refcnts end up at zero. >> >> Thanks for your comment. I have done extensive tracing through >> the domain destruction code in the hypervisor in the last few days. >> >> The bottom line: after domain destruction code in the hypervisor >> is done, all shadow pages were indeed freed up - even though >> the shadow_tainted_refcnts flag was set. I now believe the >> remaining pages are genuinely externally referenced (possibly >> by the qemu device model still running in domain0). >> >> Here are more details on what I have found: >> >> Ideally, when we destroy or shut down a VMX domain, the general >> page reference counts ended up at 0 in shadow mode, so that the >> pages can be released properly from the domain. >> >> I have traced quite a bit of code for different scenarios >> involving Windows XP running in a VMX domain. I only >> did simple operations in Windows XP, but I tried to destroy >> the VMX domain at different times (e.g. during Windows XP boot, >> during simple operations, after Windows XP has been shutdown, etc.) >> >> For non-VMX (Linux) domains, after we relinquish memory in >> domain_relinquish_resources(), all pages in the domain''s page >> list indeed had reference count of 0 and were properly freed from >> the xen heap - just like we expected. >> >> For VMX (e.g., Windows XP) domains, after we relinquish memory in >> domain_relinquish_resources(), depending on how many activities >> were done in Windows XP, there were anywhere from 2 to 100 pages >> remaining just before the domain''s structures were freed up >> by the hypervisor. Most of these pages still have page >> reference counts of 1, and therefore, could not be freed >> from the heap by the hypervisor. This prevents the rest >> of the domain''s resources from being released, and therefore, >> ''xm list'' still shows the VMX domains after they were destroyed. >> >> In shadow mode, the following things could be reflected >> in the page (general) reference counts: >> >> (a) General stuff: >> - page is allocated (PGC_allocated) >> - page is pinned >> - page is pointed by CR3''s >> (b) Shadow page tables (l1, l2, hl2, etc.) >> (c) Out-of-sync entries >> (d) Grant table mappings >> (e) External references (not through grant table) >> (f) Monitor page table references (external shadow mode) (g) >> Writable PTE predictions (h) GDTs/LDTs >> >> So I put in a lot of instrumentation and tracing code, >> and made sure that the above things were taken into >> account and removed from the page reference counts >> during the domain destruction code sequence in the >> hypervisor. During this code sequence, we disable >> shadow mode (shadow_mode_disable()) and the >> shadow_tainted_refcnts flag was set. However, >> much to my surprise, the page reference counts >> were properly taken care of in shadow mode, and >> all shadow pages (including those in l1, l2, hl2 >> tables and snapshots) were all freed up. >> >> In particular, here''s where each of the things >> in the above list was taken into account during >> the domain destruction code sequence in the >> hypervisor: >> >> (a) General stuff: >> - None of remaining pages have PGC_allocated >> flag set >> - None of remaining pages are still pinned >> - The monitor shadow ref was 0, and all >> pages pointed to by CR3''s were taken care >> of in free_shadow_pages() >> (b) All shadow pages (including those pages in >> l1, l2, hl2, snapshots) were freed properly. >> I implemented counters to track all shadow >> page promotions/allocations and demotions/ >> deallocations throughout the hypervisor code, >> and at the end after we relinquished all domain >> memory pages, these counters did indeed >> return to 0 - as we expected. >> (c) out-of-sync entries -> in free_out_of_sync_state() >> called by free_shadow_pages(). >> (d) grant table mappings -> the count of active >> grant table mappings is 0 after the domain >> destruction sequence in the hypervisor is >> executed. >> (e) external references not mapped via grant table >> -> I believe that these include the qemu-dm >> pages which still remain after we relinquish >> all domain memory pages - as the qemu-dm may >> still be active after a VMX domain has been >> destroyed. >> (f) external monitor page references -> all references >> from monitor page table are dropped in >> vmx_relinquish_resources(), and monitor table >> itself is freed in domain_destruct(). In fact, >> in my code traces, the monitor shadow reference >> count was 0 after the domain destruction code in the >> hypervisor. (g) writable PTE predictions -> I didn''t see any pages in >> this category in my code traces, but if there >> are, they would be freed up in free_shadow_pages(). >> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and >> invalidate_shadow_ldt() called from domain_relinquish_ >> resources(). >> >> Based on the code instrumentation and tracing above, I am >> pretty confident that the shadow page reference counts >> were handled properly during the domain destruction code >> sequence in the hypervisor. There is a problem in keeping >> track of shadow page counts (domain->arch.shadow_page_count), >> and I will submit a patch to fix this shortly. However, this >> does not really impact how shadow pages are handled. >> >> Consequently, the pages that still remain after the domain >> destruction code sequence in the hypervisor are externally >> referenced and may belong to the qemu device model running >> in domain0. The fact that qemu-dm is still active for some >> time after a VMX domain has been torn down in the hypervisor >> is evident by examining the tools code (python). In fact, >> if I forcibly free these remaining pages from the xen heap, >> the system/dom0 crashed. >> >> Am I missing anything ? Your comments, suggestions, etc., >> are welcome! Thanks for reading this rather long email :-) >> >> Khoa H. >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Khoa Huynh
2005-Sep-19 07:57 UTC
RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly
Jiang, Yunhong wrote:>> I think the Xin Xiaohui''s patch resolved most problem for the page >> left issue. >> >> Also another small issue found is the on vmx_set_cr0. On Windows, the >> guest will enter/leave protected mode many times. Following code >> cause problem. >> if ((value & X86_CR0_PE) && (value & X86_CR0_PG) && >> !paging_enabled) { >> /* >> * The guest CR3 must be pointing to the guest physical. >> */ >> if ( !VALID_MFN(mfn = get_mfn_from_pfn( >> d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) || >> !get_page(pfn_to_page(mfn), d->domain) ) >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> { >> printk("Invalid CR3 value = %lx", >> d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /* >> need to take a clean path */ } >> >> We should place the get_page to when guest set cr3 when not paging, >> i.e >> >> case 3: >> { >> unsigned long old_base_mfn, mfn; >> >> /* >> * If paging is not enabled yet, simply copy the value to CR3. >> */ >> if (!vmx_paging_enabled(d)) { >> ..... get page here.... >> > >More exactly should be: > if (!vmx_paging_enabled(d)) { > get_page for new mfn > put_page for old mfn > }Xin''s patch works for some of my Windows XP scenarios, but there are still cases where there were still pages remaining. I tried to implement a second patch based on your suggestion above, but that did not work. Do you happen to have a complete patch to fix this ? Thanks. Khoa H. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Sep-19 09:17 UTC
Re: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyedor shut down cleanly
On 19 Sep 2005, at 06:54, Jiang, Yunhong wrote:> More exactly should be: > if (!vmx_paging_enabled(d)) { > get_page for new mfn > put_page for old mfn > }What would old_mfn be in this case? If we do any cr3 ref-counting on cr0 writes I would expect it to be: * put_page() on clearing of CR0_PG * get_page() on setting of CR0_PG Given that writing cr3 currently does a get_page/put_page regardless of the setting of CR0_PG, is there any need to be doing get_page() within vmx_set_cr0()? I think it''s more correct to do get/put in vmx_set_cr0 (at that point we can know what format the pagetables should be, for example (32/32p/64). But I think it''s a bigger cleanup patch. Whatever you go for: please send a complete tested patch. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel