I have a machine on which the pvops kernel boots fine in bare metal, but when running under Xen, times out waiting for the root device. The console log (with some additional messages added to track back the root problem) is attached. Going backwards to the root of the problem: The root device times out even though the device driver successfully loads and sees the device -- it complains about missing interrupts. It misses interrupts because the PHYSDEVOP_setup_gsi returns -16, EBUSY, and the pvops kernel doesn''t retry; so the gsi is never set up. PHYSDEVOP_setup_gsi returns -EBUSY because the IRQ in question is marked move_in_progress. It was marked move_in_progress earlier when one of the serial drivers remapped the first 15 IRQs. (At least, I presume it''s the serial driver, since it happens immediately after the serial driver is loaded; I could be wrong.) I have a simple work-around, which is to set dom0_max_vcpus=0 on the Xen command line; the guest successfully boots after that. (So I should be able to test those p2m patches that have been queued up for a few weeks now.) However, it seems that moving IRQs is not handled properly. Either the pvops kernel should retry if it gets an -EBUSY, or the hypercall should not fail, but wait until it can return success. I discovered all this by adding debug statements to the IRQ path; the patch is attached, if anyone else wants to use it. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/08/2010 15:56, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> However, it seems that moving IRQs is not handled properly. Either > the pvops kernel should retry if it gets an -EBUSY, or the hypercall > should not fail, but wait until it can return success.Move_in_progress is a private hypervisor implementation detail. We shouldn''t make it visible to the guest via EBUSY. Worst case we should be turning the retry into a hypercall continuation. Even better if we don''t need to make the IRQ bind request wait at all... Not sure why that needs to happen at all. -- Keir> I discovered all this by adding debug statements to the IRQ path; the > patch is attached, if anyone else wants to use it. > > -George_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-11 15:38 UTC
Re: [Xen-devel] IRQs, move_in_progress, -EBUSY &c
On 08/11/2010 07:56 AM, George Dunlap wrote:> Going backwards to the root of the problem: The root device times out > even though the device driver successfully loads and sees the device > -- it complains about missing interrupts. It misses interrupts > because the PHYSDEVOP_setup_gsi returns -16, EBUSY, and the pvops > kernel doesn''t retry; so the gsi is never set up.Aside from any of the other issues, EBUSY doesn''t generally suggest that the caller should try again. That would be EAGAIN. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/08/2010 15:56, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> However, it seems that moving IRQs is not handled properly. Either > the pvops kernel should retry if it gets an -EBUSY, or the hypercall > should not fail, but wait until it can return success.Can you try the attached patch? Thanks, Keir> I discovered all this by adding debug statements to the IRQ path; the > patch is attached, if anyone else wants to use it. > > -George_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Seems to work about 50/50. Attached is a log of a successful boot (exile.008.log), and a failed boot (exile.009.log). Suspicious things about the failed case: the usb code starts to initialize before the SATA code finishes initializing, and complains that "Controller is probably using the wrong IRQ". Keir: the machine in question (as you may have guessed) is exile; let me know if you want to grab it and use it directly. -George On Wed, Aug 11, 2010 at 4:59 PM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 11/08/2010 15:56, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > >> However, it seems that moving IRQs is not handled properly. Either >> the pvops kernel should retry if it gets an -EBUSY, or the hypercall >> should not fail, but wait until it can return success. > > Can you try the attached patch? > > Thanks, > Keir > >> I discovered all this by adding debug statements to the IRQ path; the >> patch is attached, if anyone else wants to use it. >> >> -George > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/08/2010 18:49, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> Seems to work about 50/50. > > Attached is a log of a successful boot (exile.008.log), and a failed > boot (exile.009.log). Suspicious things about the failed case: the > usb code starts to initialize before the SATA code finishes > initializing, and complains that "Controller is probably using the > wrong IRQ".Cc''ing Xiantao Zhang, who submitted the per-CPU IDT patches. Perhaps he has some ideas how to fix this. The only other simple thing I can think to try is to modify my patch so that it loops in the hypervisor. Something like: do{ ret = mp_register_gsi(...}; } while (ret == -EBUSY); Since the condition being EBUSYed on is cleared in hardirq context, that should be safe. Apart from that, it is possible that greater surgery is neede don the per-CPU IDT and IRQ migration logic, and I think we need Xiantao''s help for that. -- Keir> Keir: the machine in question (as you may have guessed) is exile; let > me know if you want to grab it and use it directly. > > -George > > On Wed, Aug 11, 2010 at 4:59 PM, Keir Fraser <keir.fraser@eu.citrix.com> > wrote: >> On 11/08/2010 15:56, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: >> >>> However, it seems that moving IRQs is not handled properly. Either >>> the pvops kernel should retry if it gets an -EBUSY, or the hypercall >>> should not fail, but wait until it can return success. >> >> Can you try the attached patch? >> >> Thanks, >> Keir >> >>> I discovered all this by adding debug statements to the IRQ path; the >>> patch is attached, if anyone else wants to use it. >>> >>> -George >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Looks like there are other callers of __assign_irq_vector() which also don''t handle the -EBUSY return value, namely xen/arch/x86/io_apic.c:set_desc_affinity(). Unfortunately, set_desc_affinity() cannot simply loop until it stops getting -EBUSY, as it is almost always called with irqs disabled -- so the very IPI which will call the function to make it not busy anymore is blocked. And it only returns one value (a cpu mask), and the function which calls it returns no value at all; so we can''s pass the "loop and retry" up one more level; we''d have to do a ton more code rewriting to be able to handle retries. Can we just call the cleanup function directly if we get -EBUSY? -George On Wed, Aug 11, 2010 at 7:18 PM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 11/08/2010 18:49, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > >> Seems to work about 50/50. >> >> Attached is a log of a successful boot (exile.008.log), and a failed >> boot (exile.009.log). Suspicious things about the failed case: the >> usb code starts to initialize before the SATA code finishes >> initializing, and complains that "Controller is probably using the >> wrong IRQ". > > Cc''ing Xiantao Zhang, who submitted the per-CPU IDT patches. Perhaps he has > some ideas how to fix this. The only other simple thing I can think to try > is to modify my patch so that it loops in the hypervisor. Something like: > do{ ret = mp_register_gsi(...}; } while (ret == -EBUSY); > Since the condition being EBUSYed on is cleared in hardirq context, that > should be safe. > > Apart from that, it is possible that greater surgery is neede don the > per-CPU IDT and IRQ migration logic, and I think we need Xiantao''s help for > that. > > -- Keir > >> Keir: the machine in question (as you may have guessed) is exile; let >> me know if you want to grab it and use it directly. >> >> -George >> >> On Wed, Aug 11, 2010 at 4:59 PM, Keir Fraser <keir.fraser@eu.citrix.com> >> wrote: >>> On 11/08/2010 15:56, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: >>> >>>> However, it seems that moving IRQs is not handled properly. Either >>>> the pvops kernel should retry if it gets an -EBUSY, or the hypercall >>>> should not fail, but wait until it can return success. >>> >>> Can you try the attached patch? >>> >>> Thanks, >>> Keir >>> >>>> I discovered all this by adding debug statements to the IRQ path; the >>>> patch is attached, if anyone else wants to use it. >>>> >>>> -George >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/08/2010 20:06, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> Looks like there are other callers of __assign_irq_vector() which also > don''t handle the -EBUSY return value, namely > xen/arch/x86/io_apic.c:set_desc_affinity().Looks like its callers do the same as on native Linux though -- namely silently bail on error. I think the set_affinity type callers are pretty benign if they fail-as-noop. In all of this, where possible I think we just have to stay close to what Linux does. -- Keir> Unfortunately, set_desc_affinity() cannot simply loop until it stops > getting -EBUSY, as it is almost always called with irqs disabled -- so > the very IPI which will call the function to make it not busy anymore > is blocked. And it only returns one value (a cpu mask), and the > function which calls it returns no value at all; so we can''s pass the > "loop and retry" up one more level; we''d have to do a ton more code > rewriting to be able to handle retries. > > Can we just call the cleanup function directly if we get -EBUSY?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Aug 11, 2010 at 8:06 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> Can we just call the cleanup function directly if we get -EBUSY?If anyone is curious about the answer to this one, it''s "No... at least, not simply." I tried it, and it hangs; probably waiting to grab a spinlock held by someone who disabled IRQs to make sure this didn''t happen. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Aug 11, 2010 at 8:12 PM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> Looks like its callers do the same as on native Linux though -- namely > silently bail on error. I think the set_affinity type callers are pretty > benign if they fail-as-noop. In all of this, where possible I think we just > have to stay close to what Linux does.Hmm... maybe the failed affinity set is a red herring, then, as I''m seeing them also in successful boots. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel