I''ve been fighting random crashes in the paravirt tree for a while. After a fair amount of head-banging, it looks to me like the shadow2 code is trashing the guest stack (and maybe register state) at random points. If I boot a kernel with CONFIG_DEBUG_PAGEALLOC enabled (which dramatically increases the rate of pagetable modifications), it rarely makes it through early boot without some random crash. The crashes are often at the same place, but they move around; however they tend to be near places where the pagetable is touched. It may also interact with timer events; certainly masking events seems to help a bit. I tend to see this a lot more when running under qemu, but I''ve also seen strange things happen on real hardware. If I roll Xen back to pre-shadow2 (change fda70200da01), all these mysterious crashes disappear. Looking into it a bit more deeply, the kind of crash I''m seeing are along the lines of: mov (%ebx), %eax # works; %ebx is a valid pointer call xen_enable_irq mov %eax, (%ebx) # crashes; %ebx will equal 0, 1, or something bad where xen_enable_irq will have pushed %ebx, set the flag state, polled for pending events and popped %ebx. My suspicion is that something about re-enabling interrupts is causing the on-stack version of %ebx to get trashed, rather than the actual %ebx register state (in general the corrupted register is the one near or at the top of the stack). Sometimes the corruption shows up as %eip off in the weeds (either at NULL-ish addresses, or executing the stack). I''m speculating that the sequence is: 1. change pagetable; this creates a deferred pagetable update 2. enable events 3. handle pending timer interrupt, which also does a deferred pagetable update 4. resume running with corrupted stack But I don''t really know enough about how shadow2 works to know if that''s really plausible. Maybe a vcpu/guest context switch is a part of the sequence. I wonder if the stack corruption is caused by a mismatch of exception frame formats between exception->iret? All a bit handwavy, but I haven''t really managed to make much headway. I spent some time assuming it was a bug on my side, but the fact that all these symptoms go away with pre-shadow2 Xen makes me point the finger over the wall. Or perhaps it really is just a qemu bug, but I can''t imagine that shadow2 exercises qemu''s emulated CPU exception stuff in a way which normal Xen doesn''t... I think its more likely that there''s some race which is much more easily triggered by qemu''s slow speed. Any thoughts or ideas? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/10/06 12:27 am, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> All a bit handwavy, but I haven''t really managed to make much headway. > I spent some time assuming it was a bug on my side, but the fact that > all these symptoms go away with pre-shadow2 Xen makes me point the > finger over the wall. > > Or perhaps it really is just a qemu bug, but I can''t imagine that > shadow2 exercises qemu''s emulated CPU exception stuff in a way which > normal Xen doesn''t... I think its more likely that there''s some race > which is much more easily triggered by qemu''s slow speed. > > Any thoughts or ideas?The shadow-refcount code in shadow2 is very new. It''s likely to have bugs that need shaking out. It won''t be a deferred pagetable update because they don''t exist in shadow2 (updates are always synchronously emulated). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2006-Oct-16 22:09 UTC
[Xen-devel] Re: shadow2 corrupting PV guest state
Keir Fraser wrote:> The shadow-refcount code in shadow2 is very new. It''s likely to have bugs > that need shaking out.Yep, that''s pretty much what I''d expect.> It won''t be a deferred pagetable update because they > don''t exist in shadow2 (updates are always synchronously emulated). >OK, so every write to a pagetable page will take the fault&emulate path? What''s your confidence level in the emulator? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 16/10/06 11:09 pm, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:>> It won''t be a deferred pagetable update because they >> don''t exist in shadow2 (updates are always synchronously emulated). >> > > OK, so every write to a pagetable page will take the fault&emulate > path? What''s your confidence level in the emulator?Pretty high. It''s used *a lot*. I''m >95% confident the bug is elsewhere. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2006-Oct-17 18:54 UTC
[Xen-devel] Re: shadow2 corrupting PV guest state
Keir Fraser wrote:> Pretty high. It''s used *a lot*. I''m >95% confident the bug is elsewhere. >I guess all that code is also used in the hvm case. What paths are unique to non-hvm shadow users? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-20 13:42 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
Hi, You (jeremy) said:> I''ve been fighting random crashes in the paravirt tree for a while. > After a fair amount of head-banging, it looks to me like the shadow2 > code is trashing the guest stack (and maybe register state) at random > points.I have a question about shadow2 in another point of view. I''ve been porting PV-on-HVM driver for ia64 platform. In my jobs, I had a doubt that shadow2 might occur a problem of memory corruption. At first, I had found the problem as a hypervisor crash during destruction of HVM domain with active VNIF on ia64 platform. The reason of crash was that hypervisor detected P2M table used by gnttab_copy in the HVM domain destruction. Thus I looked for a way to avoid hypervisor crash in x86 code. So, I found that: * Before shadow2 age, x86 and ia64 use same logic for domain destruction. - at first, release gnttab references - destruct page table for VCPU - destruct P2M table for domain - relinquish memory for domain * After shadow2 age, x86 introduces delayed P2M table destruction. - release gnttab references - destruct page table for VCPU - relinquish memory for domain - destruct P2M table for domain in domain_destroy() *** I don''t have confidence in my investigation. *** Am I right ? I try to show the code that... [common/domain.c] 203 void domain_kill(struct domain *d) 204 { 205 domain_pause(d); 206 207 if ( test_and_set_bit(_DOMF_dying, &d->domain_flags) ) 208 return; 209 210 gnttab_release_mappings(d); 211 domain_relinquish_resources(d); 212 put_domain(d); 213 214 send_guest_global_virq(dom0, VIRQ_DOM_EXC); 215 } [arch/x86/domain.c] 930 void domain_relinquish_resources(struct domain *d) 931 { 932 struct vcpu *v; 933 unsigned long pfn; .... 937 /* Drop the in-use references to page-table bases. */ 938 for_each_vcpu ( d, v ) .... 979 /* Relinquish every page of memory. */ 980 relinquish_memory(d, &d->xenpage_list); 981 relinquish_memory(d, &d->page_list); .... This is the code for domain_kill phase. I think that hypervisor relinquishes memory for domain in this code. In the other hand... [common/domain.c] 322 /* Release resources belonging to task @p. */ 323 void domain_destroy(struct domain *d) 324 { 325 struct domain **pd; 326 atomic_t old, new; .... 354 arch_domain_destroy(d); 355 356 free_domain(d); 357 358 send_guest_global_virq(dom0, VIRQ_DOM_EXC); 359 } [arch/x86/domain.c] 237 void arch_domain_destroy(struct domain *d) 238 { 239 shadow_final_teardown(d); .... [arch/x86/mm/shadow/common.c] 2580 void shadow_final_teardown(struct domain *d) 2581 /* Called by arch_domain_destroy(), when it''s safe to pull down the p2m map. */ 2582 { .... 2597 /* It is now safe to pull down the p2m map. */ 2598 if ( d->arch.shadow.p2m_pages != 0 ) 2599 shadow_p2m_teardown(d); In this code, P2M table are released. If my speculation is correct, shadow2 may occur a problem of memory corruption. What do you think about this point ? Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 22:42 +0900 on 20 Oct (1161384159), Doi.Tsunehisa@jp.fujitsu.com wrote:> So, I found that: > > * Before shadow2 age, x86 and ia64 use same logic for domain > destruction. > - at first, release gnttab references > - destruct page table for VCPU > - destruct P2M table for domain > - relinquish memory for domain > > * After shadow2 age, x86 introduces delayed P2M table destruction. > - release gnttab references > - destruct page table for VCPU > - relinquish memory for domain > - destruct P2M table for domain in domain_destroy() > *** I don''t have confidence in my investigation. > *** Am I right ?Yep. The P2M table can''t be destroyed in domain_relinquish_resources, as it is needed when pulling down grant references, and foreign domains may have outstanding grant references to the dying domain''s memory even after domain_relinquish_resources.> If my speculation is correct, shadow2 may occur a problem of memory > corruption.I don''t follow quite why this would lead to memory corruption. Can you explain? Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-23 05:45 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
Hi, You (Tim.Deegan) said:>> * Before shadow2 age, x86 and ia64 use same logic for domain >> destruction. >> - at first, release gnttab references >> - destruct page table for VCPU >> - destruct P2M table for domain >> - relinquish memory for domain >> >> * After shadow2 age, x86 introduces delayed P2M table destruction. >> - release gnttab references >> - destruct page table for VCPU >> - relinquish memory for domain >> - destruct P2M table for domain in domain_destroy() >> *** I don''t have confidence in my investigation. >> *** Am I right ? > > Yep. The P2M table can''t be destroyed in domain_relinquish_resources, > as it is needed when pulling down grant references, and foreign domains > may have outstanding grant references to the dying domain''s memory even > after domain_relinquish_resources.Thanks. I''ve supposed that the introduce of P2M table delayed destruction was for gnttab_copy feature indeed. It seems that other grant table references are released with gnttab_release_mapping() in domain_kill().>> If my speculation is correct, shadow2 may occur a problem of memory >> corruption. > > I don''t follow quite why this would lead to memory corruption. Can you > explain?I''ll try to explain it. Basically, the referencee should not be released during to exist the referencer, I think. In domain_kill phase, domain_relinquish_resource releases a memory of destroying domain. So, the memory may use other domain. But, P2M table of the domain exists, then the memory might be corrupted by gnttab_copy. In __gnttab_copy code, it will avoid to corrupt a memory that was used in destroying domain with __acquire_grant_for_copy and get_page. But, I think that it has atomicity issue of owner. In my opinion, P2M table should be destroyed before releasing memory. So, if gnttab_copy will use it, it shall be failed as invalid access, because the gnttab_copy doesn''t have to succeed during the domain destruction. What do you think about ? Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 14:45 +0900 on 23 Oct (1161614732), Doi.Tsunehisa@jp.fujitsu.com wrote:> Basically, the referencee should not be released during to exist the > referencer, I think. > > In domain_kill phase, domain_relinquish_resource releases a memory > of destroying domain. So, the memory may use other domain. But, P2M > table of the domain exists, then the memory might be corrupted by > gnttab_copy. > > In __gnttab_copy code, it will avoid to corrupt a memory that was > used in destroying domain with __acquire_grant_for_copy and get_page. > But, I think that it has atomicity issue of owner.Are you worried about a race where the foreign domain is destroyed and another domain created, with the same struct domain pointer, and which owns the same frame, between the __acquire_grant_for_copy() and the get_page()? Earlier in __gnttab_copy, we call find_domain_by_id() on the foreign domain, which calls get_domain(), so we''re safe from that. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-23 11:21 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
Hi, You (Tim.Deegan) said:>> Basically, the referencee should not be released during to exist the >> referencer, I think. >> >> In domain_kill phase, domain_relinquish_resource releases a memory >> of destroying domain. So, the memory may use other domain. But, P2M >> table of the domain exists, then the memory might be corrupted by >> gnttab_copy. >> >> In __gnttab_copy code, it will avoid to corrupt a memory that was >> used in destroying domain with __acquire_grant_for_copy and get_page. >> But, I think that it has atomicity issue of owner. > > Are you worried about a race where the foreign domain is destroyed and > another domain created, with the same struct domain pointer, and which > owns the same frame, between the __acquire_grant_for_copy() and the > get_page()?No, I''m worried that two domains use with same page frame. The released pages can be used by new domain, but old domain sturct exists between domain_kill and domain_destroy.> Earlier in __gnttab_copy, we call find_domain_by_id() on the foreign > domain, which calls get_domain(), so we''re safe from that.I suppose that find_domain_by_id doesn''t ensure not to be used by both domains. Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 20:21 +0900 on 23 Oct (1161634872), Doi.Tsunehisa@jp.fujitsu.com wrote:> > Are you worried about a race where the foreign domain is destroyed and > > another domain created, with the same struct domain pointer, and which > > owns the same frame, between the __acquire_grant_for_copy() and the > > get_page()? > > No, I''m worried that two domains use with same page frame. > > The released pages can be used by new domain, but old domain sturct > exists between domain_kill and domain_destroy.If the released frames are used by a new domain, get_page() will fail: the old domain still exists (we have a reference to it), so the new owner''s domain pointer must be different from the one we pass to get_page. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-24 07:18 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
Hi Tim, You (Tim.Deegan) said:>>> Are you worried about a race where the foreign domain is destroyed and >>> another domain created, with the same struct domain pointer, and which >>> owns the same frame, between the __acquire_grant_for_copy() and the >>> get_page()? >> >> No, I''m worried that two domains use with same page frame. >> >> The released pages can be used by new domain, but old domain sturct >> exists between domain_kill and domain_destroy. > > If the released frames are used by a new domain, get_page() will fail: > the old domain still exists (we have a reference to it), so the new > owner''s domain pointer must be different from the one we pass to > get_page.In my investigation, get_page() assumes that the page frame is in use. But, the page_info structure of released page frame should not be treated as inuse. Thus the nd value might be invalid value in this situation, I think. [xen/include/asm-x86/mm.h] static inline int get_page(struct page_info *page, struct domain *domain) { u32 x, nx, y = page->count_info; u32 d, nd = page->u.inuse._domain; /* <<=== THIS LINE */ u32 _domain = pickle_domptr(domain); do { x = y; nx = x + 1; d = nd; if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ unlikely(d != _domain) ) /* Wrong owner? */ { if ( !_shadow_mode_refcounts(domain) ) DPRINTK("Error pfn %lx: rd=%p, od=%p, caf=%08x, taf=%" PRtype_info "\n", page_to_mfn(page), domain, unpickle_domptr(d), x, page->u.inuse.type_info); return 0; } __asm__ __volatile__( LOCK_PREFIX "cmpxchg8b %3" : "=d" (nd), "=a" (y), "=c" (d), "=m" (*(volatile u64 *)(&page->count_info)) : "0" (d), "1" (x), "c" (d), "b" (nx) ); /* <<=== THIS LINE */ } while ( unlikely(nd != d) || unlikely(y != x) ); return 1; } Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 16:18 +0900 on 24 Oct (1161706722), Doi.Tsunehisa@jp.fujitsu.com wrote:> In my investigation, get_page() assumes that the page frame is in > use. But, the page_info structure of released page frame should not > be treated as inuse. Thus the nd value might be invalid value in this > situation, I think.True, but we don''t look at nd unless the page is allocated...> if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ > unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ > unlikely(d != _domain) ) /* Wrong owner? */Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-24 09:39 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
Hi, You (Tim.Deegan) said:>> In my investigation, get_page() assumes that the page frame is in >> use. But, the page_info structure of released page frame should not >> be treated as inuse. Thus the nd value might be invalid value in this >> situation, I think. > > True, but we don''t look at nd unless the page is allocated...Between domain_relinquish_resources() in domain_kill() and shadow_final_teadown() in domain_destroy(), nd might be looked with gnttab_copy(), I think. Is it true ? Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/10/06 10:39, "Doi.Tsunehisa@jp.fujitsu.com" <Doi.Tsunehisa@jp.fujitsu.com> wrote:>> True, but we don''t look at nd unless the page is allocated... > > Between domain_relinquish_resources() in domain_kill() and > shadow_final_teadown() in domain_destroy(), nd might be looked > with gnttab_copy(), I think.Domain_destroy() is only called when the domain refcnt reaches zero. This can only happen when all its page refcnts have reached zero. When a page''s refcnt reaches zero, get_page() no longer succeeds on it. So there is no race between gnttab_copy() and domain_destroy(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-24 10:05 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
You (Keir.Fraser) said:>>> True, but we don''t look at nd unless the page is allocated... >> >> Between domain_relinquish_resources() in domain_kill() and >> shadow_final_teadown() in domain_destroy(), nd might be looked >> with gnttab_copy(), I think. > > Domain_destroy() is only called when the domain refcnt reaches zero. This > can only happen when all its page refcnts have reached zero. When a page''s > refcnt reaches zero, get_page() no longer succeeds on it. So there is no > race between gnttab_copy() and domain_destroy().I see. I want to confirm that... In free_domheap_pages(), if the page counts of each section are zero, then domain refcount is decreased. Finaly the domain refcount is zero, and domain_destroy() is called. In the other hand, get_page checks page count like below... ..... if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */ unlikely(d != _domain) ) /* Wrong owner? */ ..... Thus, get_page can''t succeeds on it. Is my understaning is right ? Thanks, - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/10/06 11:05, "Doi.Tsunehisa@jp.fujitsu.com" <Doi.Tsunehisa@jp.fujitsu.com> wrote:> Thus, get_page can''t succeeds on it. > > Is my understaning is right ?Indeed it is. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Doi.Tsunehisa@jp.fujitsu.com
2006-Oct-24 10:16 UTC
Re: [Xen-devel] shadow2 corrupting PV guest state
You (Keir.Fraser) said:>> Thus, get_page can''t succeeds on it. >> >> Is my understaning is right ? > > Indeed it is.Thank. - Tsunehisa Doi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel