Liuyongan
2012-Jan-04 04:37 UTC
[xen-devel] create irq failed due to move_cleanup_count always being set
Hi, all I''m using xen-4.0 to do a test. And when I create a domain, it failed due to create_irq() failure. As only 33 domains were successfully created and destroyed before I got the continuous failures, and the domain just before the failure was properly destroyed(at least destroy_irq() was properly called, which will clear move_in_progress, according to the prink-message). So I can conclude for certain that __assign_irq_vector failed due to move_cleanup_count always being set. //this is the normal case when create and destroy domain whose id is 31; (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 (XEN) irq.c:223, destroy irq 77 //domain id 32 is created and destroyed correctly also. (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 (XEN) irq.c:223, destroy irq 77 //all the subsequent domain creation failed, below lists only 3 times: (XEN) physdev.c:88: dom33: can''t create irq for msi! (XEN) physdev.c:88: dom34: can''t create irq for msi! (XEN) physdev.c:88: dom35: can''t create irq for msi! I think this might be a bug and might have fixed, so I compare my code with 4.1.2 and search the mail list for potential patches. (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanup_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which add locks in __assign_irq_vector. Can anybody explain why this lock is needed? Or is there a patch that might fix my bug? Thx. Addition message: my board is arch-x86, no domains left when failed to create new ones, create_irq failure lasted one day until I reboot the board, and the irq number allocated is used certainly for a msi dev. Yong an Liu 2012.1.4
Andrew Cooper
2012-Jan-04 11:38 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
On 04/01/12 04:37, Liuyongan wrote:> Hi, all > > I''m using xen-4.0 to do a test. And when I create a domain, it failed due to create_irq() failure. As only 33 domains were successfully created and destroyed before I got the continuous failures, and the domain just before the failure was properly destroyed(at least destroy_irq() was properly called, which will clear move_in_progress, according to the prink-message). So I can conclude for certain that __assign_irq_vector failed due to move_cleanup_count always being set.Is it always 33 domains it takes to cause the problem, or does it vary? If it varies, then I think you want this patch http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 which corrects the logic which works out which moved vectors it should clean up. Without it, stale irq numbers build up in the per-cpu irq_vector tables leading to __assign_irq_vector failing with -ENOSPC as it find find a vector to allocate.> //this is the normal case when create and destroy domain whose id is 31; > (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind > (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 > (XEN) irq.c:223, destroy irq 77 > > //domain id 32 is created and destroyed correctly also. > (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind > (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 > (XEN) irq.c:223, destroy irq 77 > > //all the subsequent domain creation failed, below lists only 3 times: > (XEN) physdev.c:88: dom33: can''t create irq for msi! > (XEN) physdev.c:88: dom34: can''t create irq for msi! > (XEN) physdev.c:88: dom35: can''t create irq for msi! > > I think this might be a bug and might have fixed, so I compare my code with 4.1.2 and search the mail list for potential patches. (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanup_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which add locks in __assign_irq_vector. Can anybody explain why this lock is needed? Or is there a patch that might fix my bug? Thx.This patch fixes a problem where IOAPIC line level interrupts cease for a while. It has nothing to do with MSI interrupts. (Also, there are no locks altered, and xen-4.0-testing seems to have gained an additional hunk in hvm/vmx code unrelated to the original patch.)> Addition message: my board is arch-x86, no domains left when failed to create new ones, create_irq failure lasted one day until I reboot the board, and the irq number allocated is used certainly for a msi dev. > > Yong an Liu > 2012.1.4 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2012-Jan-04 11:42 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
On 04/01/12 11:38, Andrew Cooper wrote:> On 04/01/12 04:37, Liuyongan wrote: >> Hi, all >> >> I''m using xen-4.0 to do a test. And when I create a domain, it failed due to create_irq() failure. As only 33 domains were successfully created and destroyed before I got the continuous failures, and the domain just before the failure was properly destroyed(at least destroy_irq() was properly called, which will clear move_in_progress, according to the prink-message). So I can conclude for certain that __assign_irq_vector failed due to move_cleanup_count always being set. > Is it always 33 domains it takes to cause the problem, or does it vary? > If it varies, then I think you want this patch > http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 which > corrects the logic which works out which moved vectors it should clean > up. Without it, stale irq numbers build up in the per-cpu irq_vector > tables leading to __assign_irq_vector failing with -ENOSPC as it find > find a vector to allocate.P.S. Sorry - I mean the per-cpu vector_irq tables. The irq_vector table is something different. ~Andrew>> //this is the normal case when create and destroy domain whose id is 31; >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind >> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 >> (XEN) irq.c:223, destroy irq 77 >> >> //domain id 32 is created and destroyed correctly also. >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind >> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 >> (XEN) irq.c:223, destroy irq 77 >> >> //all the subsequent domain creation failed, below lists only 3 times: >> (XEN) physdev.c:88: dom33: can''t create irq for msi! >> (XEN) physdev.c:88: dom34: can''t create irq for msi! >> (XEN) physdev.c:88: dom35: can''t create irq for msi! >> >> I think this might be a bug and might have fixed, so I compare my code with 4.1.2 and search the mail list for potential patches. (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanup_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which add locks in __assign_irq_vector. Can anybody explain why this lock is needed? Or is there a patch that might fix my bug? Thx. > This patch fixes a problem where IOAPIC line level interrupts cease for > a while. It has nothing to do with MSI interrupts. (Also, there are no > locks altered, and xen-4.0-testing seems to have gained an additional > hunk in hvm/vmx code unrelated to the original patch.) > >> Addition message: my board is arch-x86, no domains left when failed to create new ones, create_irq failure lasted one day until I reboot the board, and the irq number allocated is used certainly for a msi dev. >> >> Yong an Liu >> 2012.1.4 >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Liuyongan
2012-Jan-05 06:13 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
> On 04/01/12 11:38, Andrew Cooper wrote: >> On 04/01/12 04:37, Liuyongan wrote: >> Hi, all >> >> I''m using xen-4.0 to do a test. And when I create a domain, it failed due to create_irq() failure. As only 33 domains were successfully created and destroyed before I got the continuous failures, and the domain just before the failure was properly destroyed(at least destroy_irq() was properly called, which will clear move_in_progress, according to the prink-message). So I can conclude for certain that __assign_irq_vector failed due to move_cleanup_count always being set. > Is it always 33 domains it takes to cause the problem, or does it vary? > If it varies, then I think you want this patch > http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 which > corrects the logic which works out which moved vectors it should clean > up. Without it, stale irq numbers build up in the per-cpu irq_vector > tables leading to __assign_irq_vector failing with -ENOSPC as it find > find a vector to allocate.Yes, I''ve noticed this patch, as only 33 domains were created before the failures, so vectors of a given cpu should not have been used up. Besides, I got this problem after 143 domains were created another time. But I could not repeat this problem manually as 4000+ domains created successfully without this problem.> >> //this is the normal case when create and destroy domain whose id is 31; >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind >> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 >> (XEN) irq.c:223, destroy irq 77 >> >> //domain id 32 is created and destroyed correctly also. >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind >> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 >> (XEN) irq.c:223, destroy irq 77 >> >> //all the subsequent domain creation failed, below lists only 3 times: >> (XEN) physdev.c:88: dom33: can''t create irq for msi! >> (XEN) physdev.c:88: dom34: can''t create irq for msi! >> (XEN) physdev.c:88: dom35: can''t create irq for msi! >> >> I think this might be a bug and might have fixed, so I compare my code with 4.1.2 and search the mail list for potential patches. (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanup_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which add locks in __assign_irq_vector. Can anybody explain why this lock is needed? Or is there a patch that might fix my bug? Thx. > This patch fixes a problem where IOAPIC line level interrupts cease for > a while. It has nothing to do with MSI interrupts. (Also, there are no > locks altered, and xen-4.0-testing seems to have gained an additional > hunk in hvm/vmx code unrelated to the original patch.) > >> Addition message: my board is arch-x86, no domains left when failed to create new ones, create_irq failure lasted one day until I reboot the board, and the irq number allocated is used certainly for a msi dev. >> >> Yong an Liu >> 2012.1.4 >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel**************
Liuyongan
2012-Jan-06 06:04 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
As only 33 domains were successfully created(and destroyed) before the problem occurring, there should be enough free IRQ number and vector number to allocate(suppose that irqs and vectors failed to deallocate). And destroy_irq() will clear move_in_progress, so move_cleanup_count must be setted? Is this the case? Yong an> -----Original Message----- > From: Liuyongan > Sent: Thursday, January 05, 2012 2:14 PM > To: Liuyongan; xen-devel@lists.xensource.com; > andrew.cooper3@citrix.com; keir@xen.org > Cc: Qianhuibin > Subject: RE: [xen-devel] create irq failed due to move_cleanup_count > always being set > > > On 04/01/12 11:38, Andrew Cooper wrote: > >> On 04/01/12 04:37, Liuyongan wrote: > >> Hi, all > >> > >> I''m using xen-4.0 to do a test. And when I create a domain, it > failed due to create_irq() failure. As only 33 domains were > successfully created and destroyed before I got the continuous > failures, and the domain just before the failure was properly > destroyed(at least destroy_irq() was properly called, which will clear > move_in_progress, according to the prink-message). So I can conclude > for certain that __assign_irq_vector failed due to move_cleanup_count > always being set. > > Is it always 33 domains it takes to cause the problem, or does it > vary? > > If it varies, then I think you want this patch > > http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 > which > > corrects the logic which works out which moved vectors it should > clean > > up. Without it, stale irq numbers build up in the per-cpu irq_vector > > tables leading to __assign_irq_vector failing with -ENOSPC as it find > > find a vector to allocate. > > Yes, I''ve noticed this patch, as only 33 domains were created before > the failures, so vectors of a given cpu should not have been used up. > Besides, I got this problem after 143 domains were created another > time. But I could not repeat this problem manually as 4000+ domains > created successfully without this problem. > > > > >> //this is the normal case when create and destroy domain whose id is > 31; > >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind > >> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 > >> (XEN) irq.c:223, destroy irq 77 > >> > >> //domain id 32 is created and destroyed correctly also. > >> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind > >> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 > >> (XEN) irq.c:223, destroy irq 77 > >> > >> //all the subsequent domain creation failed, below lists only 3 > times: > >> (XEN) physdev.c:88: dom33: can''t create irq for msi! > >> (XEN) physdev.c:88: dom34: can''t create irq for msi! > >> (XEN) physdev.c:88: dom35: can''t create irq for msi! > >> > >> I think this might be a bug and might have fixed, so I compare > my code with 4.1.2 and search the mail list for potential patches. > (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu > p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which > add locks in __assign_irq_vector. Can anybody explain why this lock is > needed? Or is there a patch that might fix my bug? Thx. > > This patch fixes a problem where IOAPIC line level interrupts cease > for > > a while. It has nothing to do with MSI interrupts. (Also, there are > no > > locks altered, and xen-4.0-testing seems to have gained an additional > > hunk in hvm/vmx code unrelated to the original patch.) > > > >> Addition message: my board is arch-x86, no domains left when > failed to create new ones, create_irq failure lasted one day until I > reboot the board, and the irq number allocated is used certainly for a > msi dev. > >> > >> Yong an Liu > >> 2012.1.4 > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel**************
Andrew Cooper
2012-Jan-06 11:00 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
Could you please avoid top posing. On 06/01/12 06:04, Liuyongan wrote:> As only 33 domains were successfully created(and destroyed) before the problem occurring, there should be enough free IRQ number and vector number to allocate(suppose that irqs and vectors failed to deallocate). And destroy_irq() will clear move_in_progress, so move_cleanup_count must be setted? Is this the case?Is it repeatably 33 domains, or was that a 1 off experiment? Can you confirm exactly which version of Xen you are using, including changeset if you know it? Without knowing your hardware, it is hard to say if there are actually enough free IRQs, although I do agree that what you are currently seeing is buggy behavior. The per-cpu IDT functionality introduced in Xen-4.0 is fragile at the best of times, and has had several bugfixes and tweaks to it which I am not certain have actually found their way back to Xen-4.0. Could you try with Xen-4.1 and see if the problem persists? ~Andrew> Yong an >> -----Original Message----- >> From: Liuyongan >> Sent: Thursday, January 05, 2012 2:14 PM >> To: Liuyongan; xen-devel@lists.xensource.com; >> andrew.cooper3@citrix.com; keir@xen.org >> Cc: Qianhuibin >> Subject: RE: [xen-devel] create irq failed due to move_cleanup_count >> always being set >> >>> On 04/01/12 11:38, Andrew Cooper wrote: >>>> On 04/01/12 04:37, Liuyongan wrote: >>>> Hi, all >>>> >>>> I''m using xen-4.0 to do a test. And when I create a domain, it >> failed due to create_irq() failure. As only 33 domains were >> successfully created and destroyed before I got the continuous >> failures, and the domain just before the failure was properly >> destroyed(at least destroy_irq() was properly called, which will clear >> move_in_progress, according to the prink-message). So I can conclude >> for certain that __assign_irq_vector failed due to move_cleanup_count >> always being set. >>> Is it always 33 domains it takes to cause the problem, or does it >> vary? >>> If it varies, then I think you want this patch >>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 >> which >>> corrects the logic which works out which moved vectors it should >> clean >>> up. Without it, stale irq numbers build up in the per-cpu irq_vector >>> tables leading to __assign_irq_vector failing with -ENOSPC as it find >>> find a vector to allocate. >> Yes, I''ve noticed this patch, as only 33 domains were created before >> the failures, so vectors of a given cpu should not have been used up. >> Besides, I got this problem after 143 domains were created another >> time. But I could not repeat this problem manually as 4000+ domains >> created successfully without this problem. >> >>>> //this is the normal case when create and destroy domain whose id is >> 31; >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind >>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 >>>> (XEN) irq.c:223, destroy irq 77 >>>> >>>> //domain id 32 is created and destroyed correctly also. >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind >>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 >>>> (XEN) irq.c:223, destroy irq 77 >>>> >>>> //all the subsequent domain creation failed, below lists only 3 >> times: >>>> (XEN) physdev.c:88: dom33: can''t create irq for msi! >>>> (XEN) physdev.c:88: dom34: can''t create irq for msi! >>>> (XEN) physdev.c:88: dom35: can''t create irq for msi! >>>> >>>> I think this might be a bug and might have fixed, so I compare >> my code with 4.1.2 and search the mail list for potential patches. >> (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu >> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch which >> add locks in __assign_irq_vector. Can anybody explain why this lock is >> needed? Or is there a patch that might fix my bug? Thx. >>> This patch fixes a problem where IOAPIC line level interrupts cease >> for >>> a while. It has nothing to do with MSI interrupts. (Also, there are >> no >>> locks altered, and xen-4.0-testing seems to have gained an additional >>> hunk in hvm/vmx code unrelated to the original patch.) >>> >>>> Addition message: my board is arch-x86, no domains left when >> failed to create new ones, create_irq failure lasted one day until I >> reboot the board, and the irq number allocated is used certainly for a >> msi dev. >>>> Yong an Liu >>>> 2012.1.4 >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel**************-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Liuyongan
2012-Jan-06 11:50 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: Friday, January 06, 2012 7:01 PM > To: Liuyongan > Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > Subject: Re: [xen-devel] create irq failed due to move_cleanup_count > always being set > > Could you please avoid top posing. > > On 06/01/12 06:04, Liuyongan wrote: > > As only 33 domains were successfully created(and destroyed) before > the problem occurring, there should be enough free IRQ number and > vector number to allocate(suppose that irqs and vectors failed to > deallocate). And destroy_irq() will clear move_in_progress, so > move_cleanup_count must be setted? Is this the case? > > Is it repeatably 33 domains, or was that a 1 off experiment? Can youNo, it''s not repeatable, this occurred 2 times, another one is after 152 domains.> confirm exactly which version of Xen you are using, including changeset > if you know it? Without knowing your hardware, it is hard to say if > there are actually enough free IRQs, although I do agree that what you > are currently seeing is buggy behavior. > > The per-cpu IDT functionality introduced in Xen-4.0 is fragile at the > best of times, and has had several bugfixes and tweaks to it which I am > not certain have actually found their way back to Xen-4.0. Could you > try with Xen-4.1 and see if the problem persists? > > ~AndrewAs I could not make it re-occure in xen-4.0, trying xen-4.1 seems useless. I noticed a scenario: 1) move_in_progress occure; 2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent; 3) the irq is destroyed, so cfg->vector is cleared, and etc.; 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded. In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is also reset, so in step 4) move_cleanup_count will failed to sub by one, and finally leading to create_irq failure(right?); In xen-4.0, step 3, and in my code vector_irq is not reset(this is a bug as you''v mentioned), I still could not figure out why create_irq should failed.> > >> -----Original Message----- > >> From: Liuyongan > >> Sent: Thursday, January 05, 2012 2:14 PM > >> To: Liuyongan; xen-devel@lists.xensource.com; > >> andrew.cooper3@citrix.com; keir@xen.org > >> Cc: Qianhuibin > >> Subject: RE: [xen-devel] create irq failed due to move_cleanup_count > >> always being set > >> > >>> On 04/01/12 11:38, Andrew Cooper wrote: > >>>> On 04/01/12 04:37, Liuyongan wrote: > >>>> Hi, all > >>>> > >>>> I''m using xen-4.0 to do a test. And when I create a domain, it > >> failed due to create_irq() failure. As only 33 domains were > >> successfully created and destroyed before I got the continuous > >> failures, and the domain just before the failure was properly > >> destroyed(at least destroy_irq() was properly called, which will > clear > >> move_in_progress, according to the prink-message). So I can conclude > >> for certain that __assign_irq_vector failed due to > move_cleanup_count > >> always being set. > >>> Is it always 33 domains it takes to cause the problem, or does it > >> vary? > >>> If it varies, then I think you want this patch > >>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 > >> which > >>> corrects the logic which works out which moved vectors it should > >> clean > >>> up. Without it, stale irq numbers build up in the per-cpu > irq_vector > >>> tables leading to __assign_irq_vector failing with -ENOSPC as it > find > >>> find a vector to allocate. > >> Yes, I''ve noticed this patch, as only 33 domains were created > before > >> the failures, so vectors of a given cpu should not have been used > up. > >> Besides, I got this problem after 143 domains were created another > >> time. But I could not repeat this problem manually as 4000+ domains > >> created successfully without this problem. > >> > >>>> //this is the normal case when create and destroy domain whose id > is > >> 31; > >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind > >>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 > >>>> (XEN) irq.c:223, destroy irq 77 > >>>> > >>>> //domain id 32 is created and destroyed correctly also. > >>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind > >>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 > >>>> (XEN) irq.c:223, destroy irq 77 > >>>> > >>>> //all the subsequent domain creation failed, below lists only 3 > >> times: > >>>> (XEN) physdev.c:88: dom33: can''t create irq for msi! > >>>> (XEN) physdev.c:88: dom34: can''t create irq for msi! > >>>> (XEN) physdev.c:88: dom35: can''t create irq for msi! > >>>> > >>>> I think this might be a bug and might have fixed, so I > compare > >> my code with 4.1.2 and search the mail list for potential patches. > >> > (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu > >> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch > which > >> add locks in __assign_irq_vector. Can anybody explain why this lock > is > >> needed? Or is there a patch that might fix my bug? Thx. > >>> This patch fixes a problem where IOAPIC line level interrupts cease > >> for > >>> a while. It has nothing to do with MSI interrupts. (Also, there > are > >> no > >>> locks altered, and xen-4.0-testing seems to have gained an > additional > >>> hunk in hvm/vmx code unrelated to the original patch.) > >>> > >>>> Addition message: my board is arch-x86, no domains left when > >> failed to create new ones, create_irq failure lasted one day until I > >> reboot the board, and the irq number allocated is used certainly for > a > >> msi dev. > >>>> Yong an Liu > >>>> 2012.1.4 > >>>> > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@lists.xensource.com > >>>> http://lists.xensource.com/xen-devel************** > > -- > Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > T: +44 (0)1223 225 900, http://www.citrix.com
Andrew Cooper
2012-Jan-06 12:18 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
On 06/01/12 11:50, Liuyongan wrote:> >> -----Original Message----- >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] >> Sent: Friday, January 06, 2012 7:01 PM >> To: Liuyongan >> Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin >> Subject: Re: [xen-devel] create irq failed due to move_cleanup_count >> always being set >> >> Could you please avoid top posing. >> >> On 06/01/12 06:04, Liuyongan wrote: >>> As only 33 domains were successfully created(and destroyed) before >> the problem occurring, there should be enough free IRQ number and >> vector number to allocate(suppose that irqs and vectors failed to >> deallocate). And destroy_irq() will clear move_in_progress, so >> move_cleanup_count must be setted? Is this the case? >> >> Is it repeatably 33 domains, or was that a 1 off experiment? Can you > No, it''s not repeatable, this occurred 2 times, another one is after 152 domains.Can you list all the failures you have seen with the number of domains? So far it seems that it has been 33 twice but many more some of the time, which doesn''t lend itself to saying "33 domains is a systematic failure" for certain at the moment.>> confirm exactly which version of Xen you are using, including changeset >> if you know it? Without knowing your hardware, it is hard to say if >> there are actually enough free IRQs, although I do agree that what you >> are currently seeing is buggy behavior. >> >> The per-cpu IDT functionality introduced in Xen-4.0 is fragile at the >> best of times, and has had several bugfixes and tweaks to it which I am >> not certain have actually found their way back to Xen-4.0. Could you >> try with Xen-4.1 and see if the problem persists? >> >> ~Andrew > As I could not make it re-occure in xen-4.0, trying xen-4.1 seems useless. > I noticed a scenario:I am confused. Above, you say that the problem is repeatable, but here you say it is not.> 1) move_in_progress occure; > 2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent; > 3) the irq is destroyed, so cfg->vector is cleared, and etc.; > 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded. > > In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is also reset, so in step 4) move_cleanup_count will failed to sub by one, and finally leading to create_irq failure(right?); > > In xen-4.0, step 3, and in my code vector_irq is not reset(this is a bug as you''v mentioned), I still could not figure out why > create_irq should failed.The first point of debugging should be to see how create_irq is failing. Is it failing because of find_unassigned_irq() or because of __assign_irq_vector(). Another piece of useful information would be what your guests are and what they are trying to do with interrupts. Are you using PCI passthrough? ~Andrew>>>> -----Original Message----- >>>> From: Liuyongan >>>> Sent: Thursday, January 05, 2012 2:14 PM >>>> To: Liuyongan; xen-devel@lists.xensource.com; >>>> andrew.cooper3@citrix.com; keir@xen.org >>>> Cc: Qianhuibin >>>> Subject: RE: [xen-devel] create irq failed due to move_cleanup_count >>>> always being set >>>> >>>>> On 04/01/12 11:38, Andrew Cooper wrote: >>>>>> On 04/01/12 04:37, Liuyongan wrote: >>>>>> Hi, all >>>>>> >>>>>> I''m using xen-4.0 to do a test. And when I create a domain, it >>>> failed due to create_irq() failure. As only 33 domains were >>>> successfully created and destroyed before I got the continuous >>>> failures, and the domain just before the failure was properly >>>> destroyed(at least destroy_irq() was properly called, which will >> clear >>>> move_in_progress, according to the prink-message). So I can conclude >>>> for certain that __assign_irq_vector failed due to >> move_cleanup_count >>>> always being set. >>>>> Is it always 33 domains it takes to cause the problem, or does it >>>> vary? >>>>> If it varies, then I think you want this patch >>>>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 >>>> which >>>>> corrects the logic which works out which moved vectors it should >>>> clean >>>>> up. Without it, stale irq numbers build up in the per-cpu >> irq_vector >>>>> tables leading to __assign_irq_vector failing with -ENOSPC as it >> find >>>>> find a vector to allocate. >>>> Yes, I''ve noticed this patch, as only 33 domains were created >> before >>>> the failures, so vectors of a given cpu should not have been used >> up. >>>> Besides, I got this problem after 143 domains were created another >>>> time. But I could not repeat this problem manually as 4000+ domains >>>> created successfully without this problem. >>>> >>>>>> //this is the normal case when create and destroy domain whose id >> is >>>> 31; >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >>>>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind >>>>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 >>>>>> (XEN) irq.c:223, destroy irq 77 >>>>>> >>>>>> //domain id 32 is created and destroyed correctly also. >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 >>>>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind >>>>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 >>>>>> (XEN) irq.c:223, destroy irq 77 >>>>>> >>>>>> //all the subsequent domain creation failed, below lists only 3 >>>> times: >>>>>> (XEN) physdev.c:88: dom33: can''t create irq for msi! >>>>>> (XEN) physdev.c:88: dom34: can''t create irq for msi! >>>>>> (XEN) physdev.c:88: dom35: can''t create irq for msi! >>>>>> >>>>>> I think this might be a bug and might have fixed, so I >> compare >>>> my code with 4.1.2 and search the mail list for potential patches. >>>> >> (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu >>>> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch >> which >>>> add locks in __assign_irq_vector. Can anybody explain why this lock >> is >>>> needed? Or is there a patch that might fix my bug? Thx. >>>>> This patch fixes a problem where IOAPIC line level interrupts cease >>>> for >>>>> a while. It has nothing to do with MSI interrupts. (Also, there >> are >>>> no >>>>> locks altered, and xen-4.0-testing seems to have gained an >> additional >>>>> hunk in hvm/vmx code unrelated to the original patch.) >>>>> >>>>>> Addition message: my board is arch-x86, no domains left when >>>> failed to create new ones, create_irq failure lasted one day until I >>>> reboot the board, and the irq number allocated is used certainly for >> a >>>> msi dev. >>>>>> Yong an Liu >>>>>> 2012.1.4 >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel************** >> -- >> Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer >> T: +44 (0)1223 225 900, http://www.citrix.com-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Liuyongan
2012-Jan-07 10:33 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: Friday, January 06, 2012 8:18 PM > To: Liuyongan > Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > Subject: Re: [xen-devel] create irq failed due to move_cleanup_count > always being set > > On 06/01/12 11:50, Liuyongan wrote: > > > >> -----Original Message----- > >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > >> Sent: Friday, January 06, 2012 7:01 PM > >> To: Liuyongan > >> Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > >> Subject: Re: [xen-devel] create irq failed due to move_cleanup_count > >> always being set > >> > >> Could you please avoid top posing. > >> > >> On 06/01/12 06:04, Liuyongan wrote: > >>> As only 33 domains were successfully created(and destroyed) > before > >> the problem occurring, there should be enough free IRQ number and > >> vector number to allocate(suppose that irqs and vectors failed to > >> deallocate). And destroy_irq() will clear move_in_progress, so > >> move_cleanup_count must be setted? Is this the case? > >> > >> Is it repeatably 33 domains, or was that a 1 off experiment? Can > you > > No, it''s not repeatable, this occurred 2 times, another one is > after 152 domains. > > Can you list all the failures you have seen with the number of domains? > So far it seems that it has been 33 twice but many more some of the > time, which doesn''t lend itself to saying "33 domains is a systematic > failure" for certain at the moment.Sorry, to make it clear: this problems occurred 2 times one is after 33 domains, the other is after 152 domains. I''m not quite expressive in English.> > >> confirm exactly which version of Xen you are using, including > changeset > >> if you know it? Without knowing your hardware, it is hard to say if > >> there are actually enough free IRQs, although I do agree that what > you > >> are currently seeing is buggy behavior. > >> > >> The per-cpu IDT functionality introduced in Xen-4.0 is fragile at > the > >> best of times, and has had several bugfixes and tweaks to it which I > am > >> not certain have actually found their way back to Xen-4.0. Could > you > >> try with Xen-4.1 and see if the problem persists? > >> > >> ~Andrew > > As I could not make it re-occure in xen-4.0, trying xen-4.1 seems > useless. > > I noticed a scenario: > > I am confused. Above, you say that the problem is repeatable, but here > you say it is not. > > > 1) move_in_progress occure; > > 2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent; > > 3) the irq is destroyed, so cfg->vector is cleared, and etc.; > > 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded. > > > > In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is also > reset, so in step 4) move_cleanup_count will failed to sub by one, and > finally leading to create_irq failure(right?); > > > > In xen-4.0, step 3, and in my code vector_irq is not reset(this is > a bug as you''v mentioned), I still could not figure out why > > create_irq should failed. > > The first point of debugging should be to see how create_irq is > failing. Is it failing because of find_unassigned_irq() or because of > __assign_irq_vector(). > > Another piece of useful information would be what your guests are and > what they are trying to do with interrupts. Are you using PCI > passthrough? > > ~AndrewThx for your suggestion. I think I''v got the reason. Dig into details: 1) move_in_progress occurs; 2) new interrupt occurs on new cpus, so IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent; 3) the irq is destroyed, so __clear_irq_vector is called; 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded with function smp_irq_move_cleanup_interrupt(); In step 3) code with patch "cpus_and(tmp_mask, cfg->old_domain, cpu_online_map);" will clear vector_irq of old_cpu_mask/old_domain, so in step 4): irq = __get_cpu_var(vector_irq)[vector]; if (irq == -1) continue; will missed the irq(cfg) to clear. In step 3) code without patch(this is the case of mine),vector_irq of old_cpu_mask is not cleared, so in step 4) irq(cfg) is found correctly, but at code: if (vector == cfg->vector && cpu_isset(me, cfg->domain)) goto unlock; there is a chance that vector should equal cfg->vector and me equal cfg->domain, but because irq is destroyed, then not "goto unlock",so cfg->move_cleanup_count--; execute unexpectedly. And left cfg->move_cleanup_count=255 finally. So I think the loop made in smp_irq_move_cleanup_interrupt should based on irq not vectors to find struct cfg. Is that right? Drowsy head on weekend, if my analysis is right, I''ll submit the patch on Monday :)> > >>>> -----Original Message----- > >>>> From: Liuyongan > >>>> Sent: Thursday, January 05, 2012 2:14 PM > >>>> To: Liuyongan; xen-devel@lists.xensource.com; > >>>> andrew.cooper3@citrix.com; keir@xen.org > >>>> Cc: Qianhuibin > >>>> Subject: RE: [xen-devel] create irq failed due to > move_cleanup_count > >>>> always being set > >>>> > >>>>> On 04/01/12 11:38, Andrew Cooper wrote: > >>>>>> On 04/01/12 04:37, Liuyongan wrote: > >>>>>> Hi, all > >>>>>> > >>>>>> I''m using xen-4.0 to do a test. And when I create a domain, > it > >>>> failed due to create_irq() failure. As only 33 domains were > >>>> successfully created and destroyed before I got the continuous > >>>> failures, and the domain just before the failure was properly > >>>> destroyed(at least destroy_irq() was properly called, which will > >> clear > >>>> move_in_progress, according to the prink-message). So I can > conclude > >>>> for certain that __assign_irq_vector failed due to > >> move_cleanup_count > >>>> always being set. > >>>>> Is it always 33 domains it takes to cause the problem, or does it > >>>> vary? > >>>>> If it varies, then I think you want this patch > >>>>> http://xenbits.xensource.com/hg/xen-unstable.hg/rev/68b903bb1b01 > >>>> which > >>>>> corrects the logic which works out which moved vectors it should > >>>> clean > >>>>> up. Without it, stale irq numbers build up in the per-cpu > >> irq_vector > >>>>> tables leading to __assign_irq_vector failing with -ENOSPC as it > >> find > >>>>> find a vector to allocate. > >>>> Yes, I''ve noticed this patch, as only 33 domains were created > >> before > >>>> the failures, so vectors of a given cpu should not have been used > >> up. > >>>> Besides, I got this problem after 143 domains were created another > >>>> time. But I could not repeat this problem manually as 4000+ > domains > >>>> created successfully without this problem. > >>>> > >>>>>> //this is the normal case when create and destroy domain whose > id > >> is > >>>> 31; > >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >>>>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind > >>>>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 > >>>>>> (XEN) irq.c:223, destroy irq 77 > >>>>>> > >>>>>> //domain id 32 is created and destroyed correctly also. > >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > >>>>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind > >>>>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 > >>>>>> (XEN) irq.c:223, destroy irq 77 > >>>>>> > >>>>>> //all the subsequent domain creation failed, below lists only 3 > >>>> times: > >>>>>> (XEN) physdev.c:88: dom33: can''t create irq for msi! > >>>>>> (XEN) physdev.c:88: dom34: can''t create irq for msi! > >>>>>> (XEN) physdev.c:88: dom35: can''t create irq for msi! > >>>>>> > >>>>>> I think this might be a bug and might have fixed, so I > >> compare > >>>> my code with 4.1.2 and search the mail list for potential patches. > >>>> > >> > (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu > >>>> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a patch > >> which > >>>> add locks in __assign_irq_vector. Can anybody explain why this > lock > >> is > >>>> needed? Or is there a patch that might fix my bug? Thx. > >>>>> This patch fixes a problem where IOAPIC line level interrupts > cease > >>>> for > >>>>> a while. It has nothing to do with MSI interrupts. (Also, there > >> are > >>>> no > >>>>> locks altered, and xen-4.0-testing seems to have gained an > >> additional > >>>>> hunk in hvm/vmx code unrelated to the original patch.) > >>>>> > >>>>>> Addition message: my board is arch-x86, no domains left when > >>>> failed to create new ones, create_irq failure lasted one day until > I > >>>> reboot the board, and the irq number allocated is used certainly > for > >> a > >>>> msi dev. > >>>>>> Yong an Liu > >>>>>> 2012.1.4 > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Xen-devel mailing list > >>>>>> Xen-devel@lists.xensource.com > >>>>>> http://lists.xensource.com/xen-devel************** > >> -- > >> Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > >> T: +44 (0)1223 225 900, http://www.citrix.com > > -- > Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > T: +44 (0)1223 225 900, http://www.citrix.com
Liuyongan
2012-Jan-09 04:50 UTC
Re: [xen-devel] create irq failed due to move_cleanup_count always being set
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Liuyongan > Sent: Saturday, January 07, 2012 6:34 PM > To: Andrew Cooper > Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > Subject: Re: [Xen-devel] [xen-devel] create irq failed due to > move_cleanup_count always being set > > > > > -----Original Message----- > > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > > Sent: Friday, January 06, 2012 8:18 PM > > To: Liuyongan > > Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > > Subject: Re: [xen-devel] create irq failed due to move_cleanup_count > > always being set > > > > On 06/01/12 11:50, Liuyongan wrote: > > > > > >> -----Original Message----- > > >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > > >> Sent: Friday, January 06, 2012 7:01 PM > > >> To: Liuyongan > > >> Cc: xen-devel@lists.xensource.com; Keir (Xen.org); Qianhuibin > > >> Subject: Re: [xen-devel] create irq failed due to > move_cleanup_count > > >> always being set > > >> > > >> Could you please avoid top posing. > > >> > > >> On 06/01/12 06:04, Liuyongan wrote: > > >>> As only 33 domains were successfully created(and destroyed) > > before > > >> the problem occurring, there should be enough free IRQ number > and > > >> vector number to allocate(suppose that irqs and vectors failed to > > >> deallocate). And destroy_irq() will clear move_in_progress, so > > >> move_cleanup_count must be setted? Is this the case? > > >> > > >> Is it repeatably 33 domains, or was that a 1 off experiment? Can > > you > > > No, it''s not repeatable, this occurred 2 times, another one is > > after 152 domains. > > > > Can you list all the failures you have seen with the number of > domains? > > So far it seems that it has been 33 twice but many more some of the > > time, which doesn''t lend itself to saying "33 domains is a systematic > > failure" for certain at the moment. > > Sorry, to make it clear: this problems occurred 2 times one is after > 33 > domains, the other is after 152 domains. I''m not quite expressive in > English. > > > > > >> confirm exactly which version of Xen you are using, including > > changeset > > >> if you know it? Without knowing your hardware, it is hard to say > if > > >> there are actually enough free IRQs, although I do agree that what > > you > > >> are currently seeing is buggy behavior. > > >> > > >> The per-cpu IDT functionality introduced in Xen-4.0 is fragile at > > the > > >> best of times, and has had several bugfixes and tweaks to it which > I > > am > > >> not certain have actually found their way back to Xen-4.0. Could > > you > > >> try with Xen-4.1 and see if the problem persists? > > >> > > >> ~Andrew > > > As I could not make it re-occure in xen-4.0, trying xen-4.1 seems > > useless. > > > I noticed a scenario: > > > > I am confused. Above, you say that the problem is repeatable, but > here > > you say it is not. > > > > > 1) move_in_progress occure; > > > 2) IPI IRQ_MOVE_CLEANUP_VECTOR interrupt is sent; > > > 3) the irq is destroyed, so cfg->vector is cleared, and etc.; > > > 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded. > > > > > > In xen-4.1 , step 3, vector_irq of old_cpu_mask/old_domain is > also > > reset, so in step 4) move_cleanup_count will failed to sub by one, > and > > finally leading to create_irq failure(right?); > > > > > > In xen-4.0, step 3, and in my code vector_irq is not reset(this > is > > a bug as you''v mentioned), I still could not figure out why > > > create_irq should failed. > > > > The first point of debugging should be to see how create_irq is > > failing. Is it failing because of find_unassigned_irq() or because > of > > __assign_irq_vector(). > > > > Another piece of useful information would be what your guests are and > > what they are trying to do with interrupts. Are you using PCI > > passthrough? > > > > ~Andrew > > Thx for your suggestion. I think I''v got the reason. Dig into > details: > 1) move_in_progress occurs; > 2) new interrupt occurs on new cpus, so IPI IRQ_MOVE_CLEANUP_VECTOR > interrupt is sent; > 3) the irq is destroyed, so __clear_irq_vector is called; > 4) IRQ_MOVE_CLEANUP_VECTOR irq is responded with function > smp_irq_move_cleanup_interrupt(); > > In step 3) code with patch "cpus_and(tmp_mask, cfg->old_domain, > cpu_online_map);" will clear vector_irq of old_cpu_mask/old_domain, > so in step 4): > irq = __get_cpu_var(vector_irq)[vector]; > > if (irq == -1) > continue; > will missed the irq(cfg) to clear.Because move_in_progress will be cleared right after IPI irq sending, so the chance of old_domain''s vector_irq being cleared is little, yet chance do exist, so loop based on irq will solve this problem.> > In step 3) code without patch(this is the case of mine),vector_irq > of old_cpu_mask is not cleared, so in step 4) irq(cfg) is found > correctly, but at code: > if (vector == cfg->vector && cpu_isset(me, cfg->domain)) > goto unlock; > there is a chance that vector should equal cfg->vector and me equal > cfg->domain, but because irq is destroyed, then not "goto unlock",so > cfg->move_cleanup_count--; > execute unexpectedly. And left cfg->move_cleanup_count=255 finally.This needs a scenario like this: two irqs move concurrently from/to one cpu: Irq 69 moves from cpu5 to cpu6,and irq 70 moves from cpu6 to cpu7, then if cpu6 receives IPI because of irq 70 moving completion, and at the meantime, irq 69 is destroyed, then irq69''s cfg-> move_cleanup_count may be a invalid value of 255. The radical reason of this problem is that cpu who receives IPI irq cannot tell which vector completes move if there are two move concurrently.> > So I think the loop made in smp_irq_move_cleanup_interrupt should > based on irq not vectors to find struct cfg. > > Is that right? Drowsy head on weekend, if my analysis is right, I''ll > submit the patch on Monday :) > > > > > >>>> -----Original Message----- > > >>>> From: Liuyongan > > >>>> Sent: Thursday, January 05, 2012 2:14 PM > > >>>> To: Liuyongan; xen-devel@lists.xensource.com; > > >>>> andrew.cooper3@citrix.com; keir@xen.org > > >>>> Cc: Qianhuibin > > >>>> Subject: RE: [xen-devel] create irq failed due to > > move_cleanup_count > > >>>> always being set > > >>>> > > >>>>> On 04/01/12 11:38, Andrew Cooper wrote: > > >>>>>> On 04/01/12 04:37, Liuyongan wrote: > > >>>>>> Hi, all > > >>>>>> > > >>>>>> I''m using xen-4.0 to do a test. And when I create a > domain, > > it > > >>>> failed due to create_irq() failure. As only 33 domains were > > >>>> successfully created and destroyed before I got the continuous > > >>>> failures, and the domain just before the failure was properly > > >>>> destroyed(at least destroy_irq() was properly called, which will > > >> clear > > >>>> move_in_progress, according to the prink-message). So I can > > conclude > > >>>> for certain that __assign_irq_vector failed due to > > >> move_cleanup_count > > >>>> always being set. > > >>>>> Is it always 33 domains it takes to cause the problem, or does > it > > >>>> vary? > > >>>>> If it varies, then I think you want this patch > > >>>>> http://xenbits.xensource.com/hg/xen- > unstable.hg/rev/68b903bb1b01 > > >>>> which > > >>>>> corrects the logic which works out which moved vectors it > should > > >>>> clean > > >>>>> up. Without it, stale irq numbers build up in the per-cpu > > >> irq_vector > > >>>>> tables leading to __assign_irq_vector failing with -ENOSPC as > it > > >> find > > >>>>> find a vector to allocate. > > >>>> Yes, I''ve noticed this patch, as only 33 domains were created > > >> before > > >>>> the failures, so vectors of a given cpu should not have been > used > > >> up. > > >>>> Besides, I got this problem after 143 domains were created > another > > >>>> time. But I could not repeat this problem manually as 4000+ > > domains > > >>>> created successfully without this problem. > > >>>> > > >>>>>> //this is the normal case when create and destroy domain whose > > id > > >> is > > >>>> 31; > > >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > > >>>>>> (XEN) irq.c:1377: dom31: pirq 79, irq 77 force unbind > > >>>>>> (XEN) irq.c:1593: dom31: forcing unbind of pirq 79 > > >>>>>> (XEN) irq.c:223, destroy irq 77 > > >>>>>> > > >>>>>> //domain id 32 is created and destroyed correctly also. > > >>>>>> (XEN) irq.c:1232:d0 bind pirq 79, irq 77, share flag:0 > > >>>>>> (XEN) irq.c:1377: dom32: pirq 79, irq 77 force unbind > > >>>>>> (XEN) irq.c:1593: dom32: forcing unbind of pirq 79 > > >>>>>> (XEN) irq.c:223, destroy irq 77 > > >>>>>> > > >>>>>> //all the subsequent domain creation failed, below lists only > 3 > > >>>> times: > > >>>>>> (XEN) physdev.c:88: dom33: can''t create irq for msi! > > >>>>>> (XEN) physdev.c:88: dom34: can''t create irq for msi! > > >>>>>> (XEN) physdev.c:88: dom35: can''t create irq for msi! > > >>>>>> > > >>>>>> I think this might be a bug and might have fixed, so I > > >> compare > > >>>> my code with 4.1.2 and search the mail list for potential > patches. > > >>>> > > >> > > > (http://xen.markmail.org/search/?q=move_cleanup_count#query:move_cleanu > > >>>> p_count+page:6+mid:fpkrafqbeyiauvhs+state:results) submit a > patch > > >> which > > >>>> add locks in __assign_irq_vector. Can anybody explain why this > > lock > > >> is > > >>>> needed? Or is there a patch that might fix my bug? Thx. > > >>>>> This patch fixes a problem where IOAPIC line level interrupts > > cease > > >>>> for > > >>>>> a while. It has nothing to do with MSI interrupts. (Also, > there > > >> are > > >>>> no > > >>>>> locks altered, and xen-4.0-testing seems to have gained an > > >> additional > > >>>>> hunk in hvm/vmx code unrelated to the original patch.) > > >>>>> > > >>>>>> Addition message: my board is arch-x86, no domains left > when > > >>>> failed to create new ones, create_irq failure lasted one day > until > > I > > >>>> reboot the board, and the irq number allocated is used certainly > > for > > >> a > > >>>> msi dev. > > >>>>>> Yong an Liu > > >>>>>> 2012.1.4 > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Xen-devel mailing list > > >>>>>> Xen-devel@lists.xensource.com > > >>>>>> http://lists.xensource.com/xen-devel************** > > >> -- > > >> Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > > >> T: +44 (0)1223 225 900, http://www.citrix.com > > > > -- > > Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer > > T: +44 (0)1223 225 900, http://www.citrix.com > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel