thr3ads.net - Xen devel - [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Xin, Xiaohui

2006-Nov-04 13:48 UTC

[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Some background: 

Now the 32bit HVM SMP Windows guest with the PV drivers will hang
randomly. Sometimes the problem occurs during drivers loading, and
sometimes the problem occurs when the guest is destroyed. And at last,
Xen0 will hang also. We are debugging this issue. 

With the great help of Kevin Tian, we at last find two deadlock issues
on HVM SMP guest. The description of the deadlock is followed. Suppose
we have two vcpus now.

1)       One vcpu is holding the BIGLOCK, and it wants to hold the
shadow_lock. At the same time, the other vcpu is holding the
shadow_lock, and it wants to walk the P2M table. The fault pfn address
is near the 4G boundary, for example 0xfee00, and of course the va for
the P2M table entry is now even never mapped. So when the vcpu tries to
walk the P2M table, one page fault in Xen address area occurs. The
current do_page_fault() will call spurious_page_fault() to test if it is
a page fault really or not. But the spurious_page_fault() will first try
to hold the BIGLOCK. So the deadlock.....

2)       When the guest is destroyed, Xen will call
domain_shutdown_finalise(), the function will first try to hold the
BIGLOCK, and next call vcpu_sleep_sync(). The vcpu_sleep_sync() will
wait for other vcpu''s state. But the other vcpu now is in the
spurious_page_fault(), and spurious_page_fault() will try to hold
BIGLOCK. So another situation of deadlock.

 

Is there anything wrong with the description?

If we''re right, then does the spurious_page_fault() need to hold the
BIGLOCK? We have an ugly workaround to decrease the occurring frequency
of the spurious_page_fault(), that is we try to map all the 4G P2M table
area and fill it with INVALID_MFN accordingly at P2M table allocated
time. And with the workaround, the 32bit HVM SMP Windows with PV drivers
can now run more smoothly, and can be destroyed successfully. But we
have no elegant solution now. :-(

 

Does anyone have some good suggestions? Any comments are welcome.

 

Thanks

Xioahui

 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Nov-04 18:57 UTC

head link

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

On 4/11/06 1:48 pm, "Xin, Xiaohui" <xiaohui.xin@intel.com>
wrote:
> Is there anything wrong with the description?
> If we¹re right, then does the spurious_page_fault() need to hold the
BIGLOCK?
> We have an ugly workaround to decrease the occurring frequency of the
> spurious_page_fault(), that is we try to map all the 4G P2M table area and
> fill it with INVALID_MFN accordingly at P2M table allocated time. And with
the
> workaround, the 32bit HVM SMP Windows with PV drivers can now run more
> smoothly, and can be destroyed successfully. But we have no elegant
solution
> now. :-(
>  
> Does anyone have some good suggestions? Any comments are welcome.
The deadlocks were real. I¹ve fixed them in xenunstable changesets 12240
and 12241. Thanks!

The PV drivers should not have been hitting the MMIO region with any
regularity however. Is it the LAPIC and IOAPIC that are getting hit? It
certainly makes sense to cover hot regions of the P2M table with valid
mappings  we should not expect the fault-and-fixup path to be fast. A
single extra pagetable with INVALID_MFN just below 4GB would I¹m sure speed
things up quite a bit!

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2006-Nov-05 12:06 UTC

head link

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

On Sat, Nov 04, 2006 at 09:48:12PM +0800, Xin, Xiaohui
wrote:> Some background: 
> 
> Now the 32bit HVM SMP Windows guest with the PV drivers will hang
> randomly. Sometimes the problem occurs during drivers loading, and
> sometimes the problem occurs when the guest is destroyed. And at last,
> Xen0 will hang also. We are debugging this issue. 
> 
Are these PV drivers for Windows available somewhere? Are they open source?

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Nov-06 01:54 UTC

head link

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Hi, Keir,
	Thanks for fixes and you're right that PV drivers themselves don't hit
MMIO which however enlarges the possibility to trigger deadlock case 1. PV
drivers invoke grant ops frequently which holds big lock in the start and then
may request shadow lock later. This should disappear now after you remove the
lock acquisition in page fault handler.

	It's a good suggestion to use one single entry to speed up LAPIC/IOAPIC
access, and we will make a try and bench on it. :-)

Thanks,
Kevin

________________________________________
From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] 
Sent: 2006年11月5日 2:58
To: Xin, Xiaohui; xen-devel@lists.xensource.com
Cc: Tian, Kevin; Li, Xin B; He, Qing; Mallick, Asit K; Li, Susie; Nakajima, Jun
Subject: Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows 

On 4/11/06 1:48 pm, "Xin, Xiaohui" <xiaohui.xin@intel.com>
wrote:
Is there anything wrong with the description?
If we’re right, then does the spurious_page_fault() need to hold the BIGLOCK? We
have an ugly workaround to decrease the occurring frequency of the
spurious_page_fault(), that is we try to map all the 4G P2M table area and fill
it with INVALID_MFN accordingly at P2M table allocated time. And with the
workaround, the 32bit HVM SMP Windows with PV drivers can now run more smoothly,
and can be destroyed successfully. But we have no elegant solution now. :-(
 
Does anyone have some good suggestions? Any comments are welcome.

The deadlocks were real. I’ve fixed them in xen–unstable changesets 12240 and
12241. Thanks!

The PV drivers should not have been hitting the MMIO region with any regularity
however. Is it the LAPIC and IOAPIC that are getting hit? It certainly makes
sense to cover hot regions of the P2M table with valid mappings — we should not
expect the fault-and-fixup path to be fast. A single extra pagetable with
INVALID_MFN just below 4GB would I’m sure speed things up quite a bit!

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2006-Nov-14 17:33 UTC

head link

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

>The PV drivers should not have been hitting the MMIO region with 
>any regularity however. Is it the LAPIC and IOAPIC that are getting
> hit? It certainly makes sense to cover hot regions of the P2M table
> with valid mappings - we should not expect the fault-and-fixup
> path to be fast. A single extra pagetable with INVALID_MFN just
>below 4GB would I''m sure speed things up quite a bit!
Keir
On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup
path frequently when running 64bit windows guests with 1G RAM, and quite
a few of them are caused by gpfn > 0x100000, i.e. above 4G, so how about
aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use
hvm_set_param to set the max gpfn # in xc_hvm_build.c.
-Xin
	

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Nov-14 17:37 UTC

head link

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

On 14/11/06 17:33, "Li, Xin B" <xin.b.li@intel.com> wrote:
> On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup
> path frequently when running 64bit windows guests with 1G RAM, and quite
> a few of them are caused by gpfn > 0x100000, i.e. above 4G, so how about
> aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use
> hvm_set_param to set the max gpfn # in xc_hvm_build.c.
Any idea what it''s trying to access? Presumably nothing is mapped up
there
so it just gets all-ones back from reads? I''m surprised it would be
doing
lots of accesses to totally unused memory space. That tends to be fairly
slow even on native hardware.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2006-Nov-14 17:42 UTC

head link

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

>
>On 14/11/06 17:33, "Li, Xin B" <xin.b.li@intel.com> wrote:
>
>> On x86_64 xen, we saw get_mfn_from_gpfn gets into the fault-and-fixup
>> path frequently when running 64bit windows guests with 1G 
>RAM, and quite
>> a few of them are caused by gpfn > 0x100000, i.e. above 4G, 
>so how about
>> aslo adding a gpfn range check into get_mfn_from_gpfn? And we can use
>> hvm_set_param to set the max gpfn # in xc_hvm_build.c.
>
>Any idea what it''s trying to access? Presumably nothing is 
>mapped up there
>so it just gets all-ones back from reads? I''m surprised it 
>would be doing
>lots of accesses to totally unused memory space. That tends to 
>be fairly
>slow even on native hardware.
>
that are from detecting if a guest page table page is no longer a page
table page, like in validate_gl4e.
-Xin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Nov-14 17:56 UTC

head link

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

On 14/11/06 17:42, "Li, Xin B" <xin.b.li@intel.com> wrote:
>> Any idea what it''s trying to access? Presumably nothing is
>> mapped up there
>> so it just gets all-ones back from reads? I''m surprised it
>> would be doing
>> lots of accesses to totally unused memory space. That tends to
>> be fairly
>> slow even on native hardware.
>> 
> 
> that are from detecting if a guest page table page is no longer a page
> table page, like in validate_gl4e.
I''ll leave it to Tim to decide what the best thing to do here is. But
I''m
sure we don''t need a max_gpfn parameter. Xen could maintain its own
highwater mark, updated by the alloc_p2m path, if it needs it.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Li, Xin B

2006-Nov-14 18:02 UTC

head link

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

>
>I''ll leave it to Tim to decide what the best thing to do here 
>is. But I''m
>sure we don''t need a max_gpfn parameter. Xen could maintain its own
>highwater mark, updated by the alloc_p2m path, if it needs it.
Yeah, surely that also works for me.  I''m thinking about max gpfn may
change during guest lifecycle, and we need maintain it.
-Xin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2006 - Two dead-lock situation occurs on 32bit HVM SMP Windows

[Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

Re: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows

RE: [Xen-devel]Two dead-lock situation occurs on 32bit HVM SMP Windows