Xin, Xiaohui
2006-Nov-04 13:48 UTC
[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
Some background: Now the 32bit HVM SMP Windows guest with the PV drivers will hang randomly. Sometimes the problem occurs during drivers loading, and sometimes the problem occurs when the guest is destroyed. And at last, Xen0 will hang also. We are debugging this issue. With the great help of Kevin Tian, we at last find two deadlock issues on HVM SMP guest. The description of the deadlock is followed. Suppose we have two vcpus now. 1) One vcpu is holding the BIGLOCK, and it wants to hold the shadow_lock. At the same time, the other vcpu is holding the shadow_lock, and it wants to walk the P2M table. The fault pfn address is near the 4G boundary, for example 0xfee00, and of course the va for the P2M table entry is now even never mapped. So when the vcpu tries to walk the P2M table, one page fault in Xen address area occurs. The current do_page_fault() will call spurious_page_fault() to test if it is a page fault really or not. But the spurious_page_fault() will first try to hold the BIGLOCK. So the deadlock..... 2) When the guest is destroyed, Xen will call domain_shutdown_finalise(), the function will first try to hold the BIGLOCK, and next call vcpu_sleep_sync(). The vcpu_sleep_sync() will wait for other vcpu''s state. But the other vcpu now is in the spurious_page_fault(), and spurious_page_fault() will try to hold BIGLOCK. So another situation of deadlock. Is there anything wrong with the description? If we''re right, then does the spurious_page_fault() need to hold the BIGLOCK? We have an ugly workaround to decrease the occurring frequency of the spurious_page_fault(), that is we try to map all the 4G P2M table area and fill it with INVALID_MFN accordingly at P2M table allocated time. And with the workaround, the 32bit HVM SMP Windows with PV drivers can now run more smoothly, and can be destroyed successfully. But we have no elegant solution now. :-( Does anyone have some good suggestions? Any comments are welcome. Thanks Xioahui _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-04 18:57 UTC
Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
On 4/11/06 1:48 pm, "Xin, Xiaohui" <xiaohui.xin@intel.com> wrote:> Is there anything wrong with the description? > If we¹re right, then does the spurious_page_fault() need to hold the BIGLOCK? > We have an ugly workaround to decrease the occurring frequency of the > spurious_page_fault(), that is we try to map all the 4G P2M table area and > fill it with INVALID_MFN accordingly at P2M table allocated time. And with the > workaround, the 32bit HVM SMP Windows with PV drivers can now run more > smoothly, and can be destroyed successfully. But we have no elegant solution > now. :-( > > Does anyone have some good suggestions? Any comments are welcome.The deadlocks were real. I¹ve fixed them in xenunstable changesets 12240 and 12241. Thanks! The PV drivers should not have been hitting the MMIO region with any regularity however. Is it the LAPIC and IOAPIC that are getting hit? It certainly makes sense to cover hot regions of the P2M table with valid mappings we should not expect the fault-and-fixup path to be fast. A single extra pagetable with INVALID_MFN just below 4GB would I¹m sure speed things up quite a bit! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2006-Nov-05 12:06 UTC
Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
On Sat, Nov 04, 2006 at 09:48:12PM +0800, Xin, Xiaohui wrote:> Some background: > > Now the 32bit HVM SMP Windows guest with the PV drivers will hang > randomly. Sometimes the problem occurs during drivers loading, and > sometimes the problem occurs when the guest is destroyed. And at last, > Xen0 will hang also. We are debugging this issue. >Are these PV drivers for Windows available somewhere? Are they open source? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2006-Nov-06 01:54 UTC
RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
Hi, Keir, Thanks for fixes and you're right that PV drivers themselves don't hit MMIO which however enlarges the possibility to trigger deadlock case 1. PV drivers invoke grant ops frequently which holds big lock in the start and then may request shadow lock later. This should disappear now after you remove the lock acquisition in page fault handler. It's a good suggestion to use one single entry to speed up LAPIC/IOAPIC access, and we will make a try and bench on it. :-) Thanks, Kevin ________________________________________ From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] Sent: 2006年11月5日 2:58 To: Xin, Xiaohui; xen-devel@lists.xensource.com Cc: Tian, Kevin; Li, Xin B; He, Qing; Mallick, Asit K; Li, Susie; Nakajima, Jun Subject: Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows On 4/11/06 1:48 pm, "Xin, Xiaohui" <xiaohui.xin@intel.com> wrote: Is there anything wrong with the description? If we’re right, then does the spurious_page_fault() need to hold the BIGLOCK? We have an ugly workaround to decrease the occurring frequency of the spurious_page_fault(), that is we try to map all the 4G P2M table area and fill it with INVALID_MFN accordingly at P2M table allocated time. And with the workaround, the 32bit HVM SMP Windows with PV drivers can now run more smoothly, and can be destroyed successfully. But we have no elegant solution now. :-( Does anyone have some good suggestions? Any comments are welcome. The deadlocks were real. I’ve fixed them in xen–unstable changesets 12240 and 12241. Thanks! The PV drivers should not have been hitting the MMIO region with any regularity however. Is it the LAPIC and IOAPIC that are getting hit? It certainly makes sense to cover hot regions of the P2M table with valid mappings — we should not expect the fault-and-fixup path to be fast. A single extra pagetable with INVALID_MFN just below 4GB would I’m sure speed things up quite a bit! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2006-Nov-14 17:33 UTC
RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
>The PV drivers should not have been hitting the MMIO region with >any regularity however. Is it the LAPIC and IOAPIC that are getting > hit? It certainly makes sense to cover hot regions of the P2M table > with valid mappings - we should not expect the fault-and-fixup > path to be fast. A single extra pagetable with INVALID_MFN just >below 4GB would I''m sure speed things up quite a bit!Keir On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup path frequently when running 64bit windows guests with 1G RAM, and quite a few of them are caused by gpfn > 0x100000, i.e. above 4G, so how about aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use hvm_set_param to set the max gpfn # in xc_hvm_build.c. -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-14 17:37 UTC
Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
On 14/11/06 17:33, "Li, Xin B" <xin.b.li@intel.com> wrote:> On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup > path frequently when running 64bit windows guests with 1G RAM, and quite > a few of them are caused by gpfn > 0x100000, i.e. above 4G, so how about > aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use > hvm_set_param to set the max gpfn # in xc_hvm_build.c.Any idea what it''s trying to access? Presumably nothing is mapped up there so it just gets all-ones back from reads? I''m surprised it would be doing lots of accesses to totally unused memory space. That tends to be fairly slow even on native hardware. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2006-Nov-14 17:42 UTC
RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
> >On 14/11/06 17:33, "Li, Xin B" <xin.b.li@intel.com> wrote: > >> On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup >> path frequently when running 64bit windows guests with 1G >RAM, and quite >> a few of them are caused by gpfn > 0x100000, i.e. above 4G, >so how about >> aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use >> hvm_set_param to set the max gpfn # in xc_hvm_build.c. > >Any idea what it''s trying to access? Presumably nothing is >mapped up there >so it just gets all-ones back from reads? I''m surprised it >would be doing >lots of accesses to totally unused memory space. That tends to >be fairly >slow even on native hardware. >that are from detecting if a guest page table page is no longer a page table page, like in validate_gl4e. -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-14 17:56 UTC
Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
On 14/11/06 17:42, "Li, Xin B" <xin.b.li@intel.com> wrote:>> Any idea what it''s trying to access? Presumably nothing is >> mapped up there >> so it just gets all-ones back from reads? I''m surprised it >> would be doing >> lots of accesses to totally unused memory space. That tends to >> be fairly >> slow even on native hardware. >> > > that are from detecting if a guest page table page is no longer a page > table page, like in validate_gl4e.I''ll leave it to Tim to decide what the best thing to do here is. But I''m sure we don''t need a max_gpfn parameter. Xen could maintain its own highwater mark, updated by the alloc_p2m path, if it needs it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2006-Nov-14 18:02 UTC
RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows
> >I''ll leave it to Tim to decide what the best thing to do here >is. But I''m >sure we don''t need a max_gpfn parameter. Xen could maintain its own >highwater mark, updated by the alloc_p2m path, if it needs it.Yeah, surely that also works for me. I''m thinking about max gpfn may change during guest lifecycle, and we need maintain it. -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel