i realise this is completely off-topic, but if someone on this list has some knowledge on this, see: https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html the issue is that shutting down xen domains, segfaults libvirtd... which is annoying
On gio, 2013-04-04 at 21:49 +0200, AL13N wrote:> i realise this is completely off-topic, but if someone on this list has some > knowledge on this, see: > > https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html > > the issue is that shutting down xen domains, segfaults libvirtd... which is > annoying >I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: >> i realise this is completely off-topic, but if someone on this list has some >> knowledge on this, see: >> >> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html >> >> the issue is that shutting down xen domains, segfaults libvirtd... which is >> annoying >> > I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)?The e-mail that AL13N linked to was actually from Jim, saying he would be really busy for a while and unable to look at it. The question isn''t off-topic, as libxl and xend have to work closely with libvirt. Unfortunately, I don''t think any of the active developers on this list has much familiarity with libvirt. It Would Be Good if someone could step up and learn, but with our feature freeze next week, we''re also kind of heads-down getting stuff implemented... -George
On 05.04.2013 13:13, George Dunlap wrote:> On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: >> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: >>> i realise this is completely off-topic, but if someone on this list has some >>> knowledge on this, see: >>> >>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html >>> >>> the issue is that shutting down xen domains, segfaults libvirtd... which is >>> annoying >>> >> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? > > The e-mail that AL13N linked to was actually from Jim, saying he would > be really busy for a while and unable to look at it. > > The question isn''t off-topic, as libxl and xend have to work closely > with libvirt. Unfortunately, I don''t think any of the active > developers on this list has much familiarity with libvirt. It Would > Be Good if someone could step up and learn, but with our feature > freeze next week, we''re also kind of heads-down getting stuff > implemented...I believe it is already fixed in unstable by this commit: 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event callback race" Sadly backport to 4.2 isn''t trivial. -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski:> On 05.04.2013 13:13, George Dunlap wrote: > > On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli > > > > <dario.faggioli@citrix.com> wrote: > >> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: > >>> i realise this is completely off-topic, but if someone on this list has > >>> some knowledge on this, see: > >>> > >>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html > >>> > >>> the issue is that shutting down xen domains, segfaults libvirtd... which > >>> is > >>> annoying > >> > >> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? > > > > The e-mail that AL13N linked to was actually from Jim, saying he would > > be really busy for a while and unable to look at it. > > > > The question isn''t off-topic, as libxl and xend have to work closely > > with libvirt. Unfortunately, I don''t think any of the active > > developers on this list has much familiarity with libvirt. It Would > > Be Good if someone could step up and learn, but with our feature > > freeze next week, we''re also kind of heads-down getting stuff > > implemented... > > I believe it is already fixed in unstable by this commit: > 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event > callback race" > > Sadly backport to 4.2 isn''t trivial.Looking at the patch i totally agree that it isn''t trivial... i''ll hold off for now. and will work more towards libvirt integration for Mageia 4 (our release freeze is now in effect for Mageia 3). thanks for all the help!
AL13N wrote:> Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > >> On 05.04.2013 13:13, George Dunlap wrote: >> >>> On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli >>> >>> <dario.faggioli@citrix.com> wrote: >>> >>>> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: >>>> >>>>> i realise this is completely off-topic, but if someone on this list has >>>>> some knowledge on this, see: >>>>> >>>>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html >>>>> >>>>> the issue is that shutting down xen domains, segfaults libvirtd... which >>>>> is >>>>> annoying >>>>> >>>> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? >>>> >>> The e-mail that AL13N linked to was actually from Jim, saying he would >>> be really busy for a while and unable to look at it. >>> >>> The question isn''t off-topic, as libxl and xend have to work closely >>> with libvirt. Unfortunately, I don''t think any of the active >>> developers on this list has much familiarity with libvirt. It Would >>> Be Good if someone could step up and learn, but with our feature >>> freeze next week, we''re also kind of heads-down getting stuff >>> implemented... >>> >> I believe it is already fixed in unstable by this commit: >> 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event >> callback race" >> >> Sadly backport to 4.2 isn''t trivial. >> > > Looking at the patch i totally agree that it isn''t trivial... >It will certainly help, but I''ve heard reports there are still problems even with that patch. IIRC, Bamvor has seen a similar segfault using git master of libvirt and xen-unstable, although it is a bit harder to trigger. I think we need to rework the code for handling shutdown events. The current code worked with libxl in Xen 4.1, but has proven to be racy with libxl in Xen 4.2. I plan to work on this, but unfortunately not for a few weeks. I''m busy with another project this week and will be traveling the week of April 15.> i''ll hold off for now. and will work more towards libvirt integration for > Mageia 4 (our release freeze is now in effect for Mageia 3). >FYI, although it is deprecated, the xm/xend toolstack works well with Xen 4.2, and the legacy libvirt xen driver is quite stable. It was the first hypervisor driver in libvirt :). Regards, Jim
AL13N wrote:> Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > >> On 05.04.2013 13:13, George Dunlap wrote: >> >>> On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli >>> >>> <dario.faggioli@citrix.com> wrote: >>> >>>> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: >>>> >>>>> i realise this is completely off-topic, but if someone on this list has >>>>> some knowledge on this, see: >>>>> >>>>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html >>>>> >>>>> the issue is that shutting down xen domains, segfaults libvirtd... which >>>>> is >>>>> annoying >>>>> >>>> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? >>>> >>> The e-mail that AL13N linked to was actually from Jim, saying he would >>> be really busy for a while and unable to look at it. >>> >>> The question isn''t off-topic, as libxl and xend have to work closely >>> with libvirt. Unfortunately, I don''t think any of the active >>> developers on this list has much familiarity with libvirt. It Would >>> Be Good if someone could step up and learn, but with our feature >>> freeze next week, we''re also kind of heads-down getting stuff >>> implemented... >>> >> I believe it is already fixed in unstable by this commit: >> 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event >> callback race" >> >> Sadly backport to 4.2 isn''t trivial. >> > > Looking at the patch i totally agree that it isn''t trivial... >I forgot to mention, that commit plus bc7e8a2a have been backported to our openSUSE Xen 4.2 packages https://build.opensuse.org/package/show?package=xen&project=Virtualization See 26468-libxl-race.patch and 26469-libxl-race.patch. Regards, Jim
AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"):> Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > > I believe it is already fixed in unstable by this commit: > > 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event > > callback race" > > > > Sadly backport to 4.2 isn''t trivial. > > Looking at the patch i totally agree that it isn''t trivial... > > i''ll hold off for now. and will work more towards libvirt integration for > Mageia 4 (our release freeze is now in effect for Mageia 3).We have already done the backport. The fix for this has been in Xen upstream 4.2-staging since the 22nd of February. You want these two commits from xen.git: commit a87ef897295ec17788e41e9a8f4c0ada7a5a45f8 Author: Ian Jackson <ian.jackson@eu.citrix.com> Date: Wed Jan 23 16:53:11 2013 +0000 libxl: fix stale fd event callback race commit 6f0f339dd4378d062a211969f45cd23af12bf386 Author: Ian Jackson <ian.jackson@eu.citrix.com> Date: Wed Jan 23 16:53:11 2013 +0000 libxl: fix stale timeout event callback race I don''t know if that''s any help for your Mageia release, of course. Regards, Ian.
On Mon, 8 Apr 2013, Jim Fehlig wrote:> AL13N wrote: > > Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > > > >> On 05.04.2013 13:13, George Dunlap wrote: > >> > >>> On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli > >>> > >>> <dario.faggioli@citrix.com> wrote: > >>> > >>>> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: > >>>> > >>>>> i realise this is completely off-topic, but if someone on this list has > >>>>> some knowledge on this, see: > >>>>> > >>>>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html > >>>>> > >>>>> the issue is that shutting down xen domains, segfaults libvirtd... which > >>>>> is > >>>>> annoying > >>>>> > >>>> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? > >>>> > >>> The e-mail that AL13N linked to was actually from Jim, saying he would > >>> be really busy for a while and unable to look at it. > >>> > >>> The question isn''t off-topic, as libxl and xend have to work closely > >>> with libvirt. Unfortunately, I don''t think any of the active > >>> developers on this list has much familiarity with libvirt. It Would > >>> Be Good if someone could step up and learn, but with our feature > >>> freeze next week, we''re also kind of heads-down getting stuff > >>> implemented... > >>> > >> I believe it is already fixed in unstable by this commit: > >> 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event > >> callback race" > >> > >> Sadly backport to 4.2 isn''t trivial. > >> > > > > Looking at the patch i totally agree that it isn''t trivial... > > > > It will certainly help, but I''ve heard reports there are still problems > even with that patch. IIRC, Bamvor has seen a similar segfault using > git master of libvirt and xen-unstable, although it is a bit harder to > trigger.Do you have a link to a bug report somewhere?> I think we need to rework the code for handling shutdown events. The > current code worked with libxl in Xen 4.1, but has proven to be racy > with libxl in Xen 4.2. I plan to work on this, but unfortunately not > for a few weeks. I''m busy with another project this week and will be > traveling the week of April 15.I realize that it actually takes time but it would be great if you could write down in a bit more details the proposed fix, in case somebody else volunteers to fix the issue in the meantime.> > i''ll hold off for now. and will work more towards libvirt integration for > > Mageia 4 (our release freeze is now in effect for Mageia 3). > > > > FYI, although it is deprecated, the xm/xend toolstack works well with > Xen 4.2, and the legacy libvirt xen driver is quite stable. It was the > first hypervisor driver in libvirt :).The problem is that xend doesn''t support upstream QEMU as a disk backend, and the status of blktap in most distros is pretty poor.
Op maandag 8 april 2013 17:09:52 schreef Ian Jackson:> AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"): > > Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > > > I believe it is already fixed in unstable by this commit: > > > 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event > > > callback race" > > > > > > Sadly backport to 4.2 isn''t trivial. > > > > Looking at the patch i totally agree that it isn''t trivial... > > > > i''ll hold off for now. and will work more towards libvirt integration for > > Mageia 4 (our release freeze is now in effect for Mageia 3). > > We have already done the backport. The fix for this has been in Xen > upstream 4.2-staging since the 22nd of February. You want these two > commits from xen.git: > > commit a87ef897295ec17788e41e9a8f4c0ada7a5a45f8 > Author: Ian Jackson <ian.jackson@eu.citrix.com> > Date: Wed Jan 23 16:53:11 2013 +0000 > > libxl: fix stale fd event callback race > > commit 6f0f339dd4378d062a211969f45cd23af12bf386 > Author: Ian Jackson <ian.jackson@eu.citrix.com> > Date: Wed Jan 23 16:53:11 2013 +0000 > > libxl: fix stale timeout event callback race > > I don''t know if that''s any help for your Mageia release, of course.i was more worried about the comments in the patch, ie: that it changes how tools using this (libvirt maybe) would need to be recoded. but since it''s backported, i might as well try them
Op maandag 8 april 2013 09:36:15 schreef Jim Fehlig:> AL13N wrote: > > Op maandag 8 april 2013 06:22:49 schreef Marek Marczykowski: > >> On 05.04.2013 13:13, George Dunlap wrote: > >>> On Fri, Apr 5, 2013 at 9:21 AM, Dario Faggioli > >>> > >>> <dario.faggioli@citrix.com> wrote: > >>>> On gio, 2013-04-04 at 21:49 +0200, AL13N wrote: > >>>>> i realise this is completely off-topic, but if someone on this list > >>>>> has > >>>>> some knowledge on this, see: > >>>>> > >>>>> https://www.redhat.com/archives/libvir-list/2013-April/msg00189.html > >>>>> > >>>>> the issue is that shutting down xen domains, segfaults libvirtd... > >>>>> which > >>>>> is > >>>>> annoying > >>>> > >>>> I don''t have any clue on this... But, perhaps, Jim does (Cc-ing him)? > >>> > >>> The e-mail that AL13N linked to was actually from Jim, saying he would > >>> be really busy for a while and unable to look at it. > >>> > >>> The question isn''t off-topic, as libxl and xend have to work closely > >>> with libvirt. Unfortunately, I don''t think any of the active > >>> developers on this list has much familiarity with libvirt. It Would > >>> Be Good if someone could step up and learn, but with our feature > >>> freeze next week, we''re also kind of heads-down getting stuff > >>> implemented... > >> > >> I believe it is already fixed in unstable by this commit: > >> 5f5ef65babc2ca15f43b775c4b47b0102fa2a632 "libxl: fix stale timeout event > >> callback race" > >> > >> Sadly backport to 4.2 isn''t trivial. > > > > Looking at the patch i totally agree that it isn''t trivial... > > It will certainly help, but I''ve heard reports there are still problems > even with that patch. IIRC, Bamvor has seen a similar segfault using > git master of libvirt and xen-unstable, although it is a bit harder to > trigger. > > I think we need to rework the code for handling shutdown events. The > current code worked with libxl in Xen 4.1, but has proven to be racy > with libxl in Xen 4.2. I plan to work on this, but unfortunately not > for a few weeks. I''m busy with another project this week and will be > traveling the week of April 15. > > > i''ll hold off for now. and will work more towards libvirt integration for > > Mageia 4 (our release freeze is now in effect for Mageia 3). > > FYI, although it is deprecated, the xm/xend toolstack works well with > Xen 4.2, and the legacy libvirt xen driver is quite stable. It was the > first hypervisor driver in libvirt :).i''m aware of that, and kudos to you for this... It''s just that xm/xend is deprecated as you say... and xl looks quite nice... :-)
AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"):> i was more worried about the comments in the patch, ie: that it > changes how tools using this (libvirt maybe) would need to be > recoded.Yes, in order to fully fix these races there are a number of libvirt patches needed as well. I don''t know exactly which libvirt trees these are in but the libvirt fixes are pure fixes which won''t break anything that''s not already broken. It is also the case that in theory the libxl fixes won''t break anything that''s not already broken. However, it turns out that some versions of libvirt were already broken: at least some versions of libvirt''s libxl bindings had a bug in its timeout calculation code which is triggered by timeout_modify(...{0,0}...), and the libxl patch "libxl: fix stale timeout event callback race" exposes that bug which was previously latent. Ian.
> AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"): >> i was more worried about the comments in the patch, ie: that it >> changes how tools using this (libvirt maybe) would need to be >> recoded. > > Yes, in order to fully fix these races there are a number of libvirt > patches needed as well. I don''t know exactly which libvirt trees > these are in but the libvirt fixes are pure fixes which won''t break > anything that''s not already broken. > > It is also the case that in theory the libxl fixes won''t break > anything that''s not already broken. However, it turns out that some > versions of libvirt were already broken: at least some versions of > libvirt''s libxl bindings had a bug in its timeout calculation code > which is triggered by timeout_modify(...{0,0}...), and the libxl patch > "libxl: fix stale timeout event callback race" exposes that bug which > was previously latent.FYI, applying these patches for me fixed the problem completely, i wasn''t able to segfault libvirtd anymore. i asked to have pass our release-freeze. we''ll see what they decide.
Ian Jackson wrote:> AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"): > >> i was more worried about the comments in the patch, ie: that it >> changes how tools using this (libvirt maybe) would need to be >> recoded. >> > > Yes, in order to fully fix these races there are a number of libvirt > patches needed as well. I don''t know exactly which libvirt trees > these are in but the libvirt fixes are pure fixes which won''t break > anything that''s not already broken. >libvirt >= 1.0.2 contains all of the related fixes. Jim
Jim Fehlig writes ("Re: [Xen-devel] OT: xen libvirt issue"):> Ian Jackson wrote: > > Yes, in order to fully fix these races there are a number of libvirt > > patches needed as well. I don''t know exactly which libvirt trees > > these are in but the libvirt fixes are pure fixes which won''t break > > anything that''s not already broken. > > libvirt >= 1.0.2 contains all of the related fixes.Great, thanks for that information. Ian.
AL13N wrote:>> AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"): >> >>> i was more worried about the comments in the patch, ie: that it >>> changes how tools using this (libvirt maybe) would need to be >>> recoded. >>> >> Yes, in order to fully fix these races there are a number of libvirt >> patches needed as well. I don''t know exactly which libvirt trees >> these are in but the libvirt fixes are pure fixes which won''t break >> anything that''s not already broken. >> >> It is also the case that in theory the libxl fixes won''t break >> anything that''s not already broken. However, it turns out that some >> versions of libvirt were already broken: at least some versions of >> libvirt''s libxl bindings had a bug in its timeout calculation code >> which is triggered by timeout_modify(...{0,0}...), and the libxl patch >> "libxl: fix stale timeout event callback race" exposes that bug which >> was previously latent. >> > > FYI, applying these patches for me fixed the problem completely, i wasn''t > able to segfault libvirtd anymore.I don''t think you tried hard enough :). But glad it is working for you! I still plan to improve shutdown event handling when I have some time to work on the libxl driver. Jim
Op dinsdag 9 april 2013 08:15:26 schreef Jim Fehlig:> Ian Jackson wrote: > > AL13N writes ("Re: [Xen-devel] OT: xen libvirt issue"): > >> i was more worried about the comments in the patch, ie: that it > >> changes how tools using this (libvirt maybe) would need to be > >> recoded. > > > > Yes, in order to fully fix these races there are a number of libvirt > > patches needed as well. I don''t know exactly which libvirt trees > > these are in but the libvirt fixes are pure fixes which won''t break > > anything that''s not already broken. > > libvirt >= 1.0.2 contains all of the related fixes.that''s good news, cause we have libvirt 1.0.2 (with some patches) :-)
Op dinsdag 9 april 2013 08:41:56 schreef Jim Fehlig: [...]> I don''t think you tried hard enough :). But glad it is working for > you! I still plan to improve shutdown event handling when I have some > time to work on the libxl driver.i should say, "in that particular way" but i''ve restarted and shutdowned some domains more than 10 times in the ways i could do it before...
AL13N wrote:> Op dinsdag 9 april 2013 08:41:56 schreef Jim Fehlig: > [...] > >> I don''t think you tried hard enough :). But glad it is working for >> you! I still plan to improve shutdown event handling when I have some >> time to work on the libxl driver. >> > > i should say, "in that particular way" but i''ve restarted and shutdowned some > domains more than 10 times in the ways i could do it before... >Good news. Have you tried save/restore in a loop? Shutdown handling when save completes might be more susceptible to the race. Regards, Jim
> AL13N wrote: >> Op dinsdag 9 april 2013 08:41:56 schreef Jim Fehlig: >> [...] >> >>> I don''t think you tried hard enough :). But glad it is working for >>> you! I still plan to improve shutdown event handling when I have some >>> time to work on the libxl driver. >>> >> >> i should say, "in that particular way" but i''ve restarted and shutdowned >> some >> domains more than 10 times in the ways i could do it before... >> > > Good news. Have you tried save/restore in a loop? Shutdown handling > when save completes might be more susceptible to the race.hmm, no i actually didn''t