Jiang, Yunhong
2005-Sep-19 05:39 UTC
RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyed or shut down cleanly
I think the Xin Xiaohui''s patch resolved most problem for the page left issue. Also another small issue found is the on vmx_set_cr0. On Windows, the guest will enter/leave protected mode many times. Following code cause problem. if ((value & X86_CR0_PE) && (value & X86_CR0_PG) && !paging_enabled) { /* * The guest CR3 must be pointing to the guest physical. */ if ( !VALID_MFN(mfn = get_mfn_from_pfn( d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) || !get_page(pfn_to_page(mfn), d->domain) ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ { printk("Invalid CR3 value = %lx", d->arch.arch_vmx.cpu_cr3); domain_crash_synchronous(); /* need to take a clean path */ } We should place the get_page to when guest set cr3 when not paging, i.e : case 3: { unsigned long old_base_mfn, mfn; /* * If paging is not enabled yet, simply copy the value to CR3. */ if (!vmx_paging_enabled(d)) { ..... get page here.... After above two change, windows destroied succesfully after create/login/open IE/destroy. Thanks Yunhong Jiang Khoa Huynh wrote:> Keir Fraser wrote: >> I mean forcibly decrement them to zero and free them right there and >> then. Of course, as you point out, the problem is that some of the >> pages are mapped in domain0. I''m not sure how we can distinguish >> tainted refcnts from genuine external references. Perhaps there''s a >> proper way we should be destructing the full shadow pagetables such >> that the refcnts end up at zero. > > Thanks for your comment. I have done extensive tracing through > the domain destruction code in the hypervisor in the last few > days. > > The bottom line: after domain destruction code in the hypervisor > is done, all shadow pages were indeed freed up - even though > the shadow_tainted_refcnts flag was set. I now believe the > remaining pages are genuinely externally referenced (possibly > by the qemu device model still running in domain0). > > Here are more details on what I have found: > > Ideally, when we destroy or shut down a VMX domain, the general > page reference counts ended up at 0 in shadow mode, so that the > pages can be released properly from the domain. > > I have traced quite a bit of code for different scenarios > involving Windows XP running in a VMX domain. I only > did simple operations in Windows XP, but I tried to destroy > the VMX domain at different times (e.g. during Windows XP boot, > during simple operations, after Windows XP has been shutdown, etc.) > > For non-VMX (Linux) domains, after we relinquish memory in > domain_relinquish_resources(), all pages in the domain''s page > list indeed had reference count of 0 and were properly freed from > the xen heap - just like we expected. > > For VMX (e.g., Windows XP) domains, after we relinquish memory in > domain_relinquish_resources(), depending on how many activities > were done in Windows XP, there were anywhere from 2 to 100 pages > remaining just before the domain''s structures were freed up > by the hypervisor. Most of these pages still have page > reference counts of 1, and therefore, could not be freed > from the heap by the hypervisor. This prevents the rest > of the domain''s resources from being released, and therefore, > ''xm list'' still shows the VMX domains after they were destroyed. > > In shadow mode, the following things could be reflected > in the page (general) reference counts: > > (a) General stuff: > - page is allocated (PGC_allocated) > - page is pinned > - page is pointed by CR3''s > (b) Shadow page tables (l1, l2, hl2, etc.) > (c) Out-of-sync entries > (d) Grant table mappings > (e) External references (not through grant table) > (f) Monitor page table references (external shadow mode) > (g) Writable PTE predictions > (h) GDTs/LDTs > > So I put in a lot of instrumentation and tracing code, > and made sure that the above things were taken into > account and removed from the page reference counts > during the domain destruction code sequence in the > hypervisor. During this code sequence, we disable > shadow mode (shadow_mode_disable()) and the > shadow_tainted_refcnts flag was set. However, > much to my surprise, the page reference counts > were properly taken care of in shadow mode, and > all shadow pages (including those in l1, l2, hl2 > tables and snapshots) were all freed up. > > In particular, here''s where each of the things > in the above list was taken into account during > the domain destruction code sequence in the > hypervisor: > > (a) General stuff: > - None of remaining pages have PGC_allocated > flag set > - None of remaining pages are still pinned > - The monitor shadow ref was 0, and all > pages pointed to by CR3''s were taken care > of in free_shadow_pages() > (b) All shadow pages (including those pages in > l1, l2, hl2, snapshots) were freed properly. > I implemented counters to track all shadow > page promotions/allocations and demotions/ > deallocations throughout the hypervisor code, > and at the end after we relinquished all domain > memory pages, these counters did indeed > return to 0 - as we expected. > (c) out-of-sync entries -> in free_out_of_sync_state() > called by free_shadow_pages(). > (d) grant table mappings -> the count of active > grant table mappings is 0 after the domain > destruction sequence in the hypervisor is > executed. > (e) external references not mapped via grant table > -> I believe that these include the qemu-dm > pages which still remain after we relinquish > all domain memory pages - as the qemu-dm may > still be active after a VMX domain has been > destroyed. > (f) external monitor page references -> all references > from monitor page table are dropped in > vmx_relinquish_resources(), and monitor table > itself is freed in domain_destruct(). In fact, > in my code traces, the monitor shadow reference > count was 0 after the domain destruction code in > the hypervisor. > (g) writable PTE predictions -> I didn''t see any pages in > this category in my code traces, but if there > are, they would be freed up in free_shadow_pages(). > (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and > invalidate_shadow_ldt() called from domain_relinquish_ > resources(). > > Based on the code instrumentation and tracing above, I am > pretty confident that the shadow page reference counts > were handled properly during the domain destruction code > sequence in the hypervisor. There is a problem in keeping > track of shadow page counts (domain->arch.shadow_page_count), > and I will submit a patch to fix this shortly. However, this > does not really impact how shadow pages are handled. > > Consequently, the pages that still remain after the domain > destruction code sequence in the hypervisor are externally > referenced and may belong to the qemu device model running > in domain0. The fact that qemu-dm is still active for some > time after a VMX domain has been torn down in the hypervisor > is evident by examining the tools code (python). In fact, > if I forcibly free these remaining pages from the xen heap, > the system/dom0 crashed. > > Am I missing anything ? Your comments, suggestions, etc., > are welcome! Thanks for reading this rather long email :-) > > Khoa H. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel