Christoph Egger
2011-Feb-09 11:13 UTC
[Xen-devel] libxl: error handling before xenstored runs
Hi! When I start a guest *before* xenstored runs then I get this list of error messages: libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 IMO a simple message like "xenstored is not running." would be enough. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kamala Narasimhan
2011-Feb-09 14:29 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
Christoph Egger wrote:> Hi! > > When I start a guest *before* xenstored runs > then I get this list of error messages: > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > > > IMO a simple message like "xenstored is not running." would be enough. >xl now has a check and newer versions of the toolstack should display similar message when you invoke an xl command. Is it possible you are using a slightly older version of the toolstack or directly invoking libxl library elsewhere? Kamala _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-09 14:42 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote:> Christoph Egger wrote: > > Hi! > > > > When I start a guest *before* xenstored runs > > then I get this list of error messages: > > > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > > > > > > IMO a simple message like "xenstored is not running." would be enough. > > xl now has a check and newer versions of the toolstack should display > similar message when you invoke an xl command. Is it possible you are > using a slightly older version of the toolstack or directly invoking libxl > library elsewhere?I''m currently on c/s 22834. Which c/s added the check you are talking about? Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kamala Narasimhan
2011-Feb-09 14:46 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
Christoph Egger wrote:> On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote: >> Christoph Egger wrote: >>> Hi! >>> >>> When I start a guest *before* xenstored runs >>> then I get this list of error messages: >>> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 >>> >>> >>> IMO a simple message like "xenstored is not running." would be enough. >> xl now has a check and newer versions of the toolstack should display >> similar message when you invoke an xl command. Is it possible you are >> using a slightly older version of the toolstack or directly invoking libxl >> library elsewhere? > > I''m currently on c/s 22834. Which c/s added the check you are talking about? >http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 Kamala _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-09 15:32 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wednesday 09 February 2011 15:46:56 Kamala Narasimhan wrote:> Christoph Egger wrote: > > On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote: > >> Christoph Egger wrote: > >>> Hi! > >>> > >>> When I start a guest *before* xenstored runs > >>> then I get this list of error messages: > >>> > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > >>> > >>> > >>> IMO a simple message like "xenstored is not running." would be enough. > >> > >> xl now has a check and newer versions of the toolstack should display > >> similar message when you invoke an xl command. Is it possible you are > >> using a slightly older version of the toolstack or directly invoking > >> libxl library elsewhere? > > > > I''m currently on c/s 22834. Which c/s added the check you are talking > > about? > > http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5This is c/s 22806. So my tree is new enough. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kamala Narasimhan
2011-Feb-09 15:42 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
>>> I''m currently on c/s 22834. Which c/s added the check you are talking >>> about? >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > This is c/s 22806. So my tree is new enough. >Right, but did you happen to check how you got past the check done by that patch for the case in question? Kamala _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-09 15:52 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:> >>> I''m currently on c/s 22834. Which c/s added the check you are talking > >>> about? > >> > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > This is c/s 22806. So my tree is new enough. > > Right, but did you happen to check how you got past the check done by that > patch for the case in question?The pid file simply doesn''t exist. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-09 15:54 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote:> On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > >>> I''m currently on c/s 22834. Which c/s added the check you are talking > > >>> about? > > >> > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > This is c/s 22806. So my tree is new enough. > > > > Right, but did you happen to check how you got past the check done by > > that patch for the case in question? > > The pid file simply doesn''t exist.Oh wait. Hit the ''send'' button too fast. The pid file does exist from previous boot. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Feb-09 17:39 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote:> On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > >>> I''m currently on c/s 22834. Which c/s added the check you are talking > > > >>> about? > > > >> > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > Right, but did you happen to check how you got past the check done by > > > that patch for the case in question? > > > > The pid file simply doesn''t exist. > > Oh wait. Hit the ''send'' button too fast. > > The pid file does exist from previous boot.Bleh, precisely my problem with these heuristic checks. It''s worse on my box because if this happens I end up with unkillable xl processes due to libxenstore wanting to open /dev/xen/xenbus or whatever it is. Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 08:55 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote:> On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote: > > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > > >>> I''m currently on c/s 22834. Which c/s added the check you are talking > > > > >>> about? > > > > >> > > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > > > Right, but did you happen to check how you got past the check done by > > > > that patch for the case in question? > > > > > > The pid file simply doesn''t exist. > > > > Oh wait. Hit the ''send'' button too fast. > > > > The pid file does exist from previous boot. > > Bleh, precisely my problem with these heuristic checks. It''s worse on my > box because if this happens I end up with unkillable xl processes due to > libxenstore wanting to open /dev/xen/xenbus or whatever it is.That''s the underlying bug which the heuristic is trying to avoid... Fundamentally the xs ring protocol is missing any way to tell if someone is listening on the other end so you have no choice but to try communicating and see if anyone responds. It''s a pretty straightforward bug that the kernel does the waiting to see if anyone responds bit with an uninterruptible sleep. I took a quick look a little while ago but unfortunately it didn''t look straightforward to fix on the kernel side :-( I can''t remember why though. It might be simpler to support allowing the userspace client to explicitly specify a timeout. I''m not sure what the impact on the ring is of leaving unconsumed requests on the ring when the other end does show up. Presumably the kernel driver just needs to be prepared to swallow responses whose target has given up and gone home. Maybe we should add an explicit ping/pong ring message to the xs ring protocol? Ian.> > Gianni > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Hanquez
2011-Feb-10 09:26 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On 10/02/11 08:55, Ian Campbell wrote:> That''s the underlying bug which the heuristic is trying to avoid... > > Fundamentally the xs ring protocol is missing any way to tell if someone > is listening on the other end so you have no choice but to try > communicating and see if anyone responds. > > It''s a pretty straightforward bug that the kernel does the waiting to > see if anyone responds bit with an uninterruptible sleep. I took a quick > look a little while ago but unfortunately it didn''t look straightforward > to fix on the kernel side :-( I can''t remember why though.For starter, the protocol requires the messages to sit on the ring for a underdetermined amount of time (boot watches).> It might be simpler to support allowing the userspace client to > explicitly specify a timeout. I''m not sure what the impact on the ring > is of leaving unconsumed requests on the ring when the other end does > show up. Presumably the kernel driver just needs to be prepared to > swallow responses whose target has given up and gone home.No, the simplest thing to do is to use the socket connection exclusively. Just how we''re doing it in XCP and XCI. The protocol is not design to do async either, so leaving unconsumed request, could be pretty disastrous if the other end show up. Providing the kernel doesn''t detect it (i don''t think it does [1]), it would imply spurious reply, for example the previous waiting read on "/abc/def" could reply to a next read on "/xyz/123".> Maybe we should add an explicit ping/pong ring message to the xs ring > protocol?And who''s going to reply to this if xenstored is missing ? you would require the kernel to introspect the messages and reply by itself. [1] the kernel would be happy to read the previous reply on the ring after xenstored has put the actual reply after it, and trigger the eventchn. (the kernel could actually check the requestid and see if they match, but it doesn''t.) -- Vincent _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 11:24 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote:> On 10/02/11 08:55, Ian Campbell wrote: > > That''s the underlying bug which the heuristic is trying to avoid... > > > > Fundamentally the xs ring protocol is missing any way to tell if someone > > is listening on the other end so you have no choice but to try > > communicating and see if anyone responds. > > > > It''s a pretty straightforward bug that the kernel does the waiting to > > see if anyone responds bit with an uninterruptible sleep. I took a quick > > look a little while ago but unfortunately it didn''t look straightforward > > to fix on the kernel side :-( I can''t remember why though. > > For starter, the protocol requires the messages to sit on the ring for a > underdetermined amount of time (boot watches). > > > It might be simpler to support allowing the userspace client to > > explicitly specify a timeout. I''m not sure what the impact on the ring > > is of leaving unconsumed requests on the ring when the other end does > > show up. Presumably the kernel driver just needs to be prepared to > > swallow responses whose target has given up and gone home. > > No, the simplest thing to do is to use the socket connection > exclusively. Just how we''re doing it in XCP and XCI.Right but this approach doesn''t work with xenstored in a stubdomain. Part of the point of using the ring protocol even when this isn''t the case is to help ensure that it is possible and help avoid regressions etc.> The protocol is not design to do async either, so leaving unconsumed > request, could be pretty disastrous if the other end show up. Providing > the kernel doesn''t detect it (i don''t think it does [1]), it would imply > spurious reply, for example the previous waiting read on "/abc/def" > could reply to a next read on "/xyz/123".The wire protocol includes a req_id which is echoed in the response which sh/could facilitate multiplexing this sort of thing. The pvops kernel currently always sets it to zero but that''s just an implementation detail ;-) Currently the kernel does (roughly): take_lock write_request wait_for_reply release_lock instead it should/could be: take_lock(timeout) write_request (++req_id) while read_reply.req_id != req_id && not (timeout) wait some more release lock OK, so may be this is not in the "might be simpler" bucket any more, but it sounds like plausibly the right direction to take. Properly handling multiple userspace clients asynchronously a demuxes the responses etc would be even better but I don''t think necessary to solve this particular issue.> > Maybe we should add an explicit ping/pong ring message to the xs ring > > protocol? > > And who''s going to reply to this if xenstored is missing ? you would > require the kernel to introspect the messages and reply by itself.The reason I suggested new messages was that I would solve that by declaring that these new messages have whatever magic semantics I need to make this work ;-) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-10 11:32 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thursday 10 February 2011 12:24:41 Ian Campbell wrote:> On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > On 10/02/11 08:55, Ian Campbell wrote: > > > That''s the underlying bug which the heuristic is trying to avoid... > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > someone is listening on the other end so you have no choice but to try > > > communicating and see if anyone responds. > > > > > > It''s a pretty straightforward bug that the kernel does the waiting to > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > quick look a little while ago but unfortunately it didn''t look > > > straightforward to fix on the kernel side :-( I can''t remember why > > > though. > > > > For starter, the protocol requires the messages to sit on the ring for a > > underdetermined amount of time (boot watches). > > > > > It might be simpler to support allowing the userspace client to > > > explicitly specify a timeout. I''m not sure what the impact on the ring > > > is of leaving unconsumed requests on the ring when the other end does > > > show up. Presumably the kernel driver just needs to be prepared to > > > swallow responses whose target has given up and gone home. > > > > No, the simplest thing to do is to use the socket connection > > exclusively. Just how we''re doing it in XCP and XCI. > > Right but this approach doesn''t work with xenstored in a stubdomain. > Part of the point of using the ring protocol even when this isn''t the > case is to help ensure that it is possible and help avoid regressions > etc. > > > The protocol is not design to do async either, so leaving unconsumed > > request, could be pretty disastrous if the other end show up. Providing > > the kernel doesn''t detect it (i don''t think it does [1]), it would imply > > spurious reply, for example the previous waiting read on "/abc/def" > > could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that''s just an > implementation detail ;-) Currently the kernel does (roughly): > take_lock > write_request > wait_for_reply > release_lock > instead it should/could be: > take_lock(timeout) > write_request (++req_id) > while read_reply.req_id != req_id && not (timeout) > wait some more > release lockI prefer a userland solution. Fixing Linux Dom0 doesn''t help NetBSD Dom0. Christoph> OK, so may be this is not in the "might be simpler" bucket any more, but > it sounds like plausibly the right direction to take. > > Properly handling multiple userspace clients asynchronously a demuxes > the responses etc would be even better but I don''t think necessary to > solve this particular issue. > > > > Maybe we should add an explicit ping/pong ring message to the xs ring > > > protocol? > > > > And who''s going to reply to this if xenstored is missing ? you would > > require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-) > > Ian.-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 11:43 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote:> On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > > On 10/02/11 08:55, Ian Campbell wrote: > > > > That''s the underlying bug which the heuristic is trying to avoid... > > > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > > someone is listening on the other end so you have no choice but to try > > > > communicating and see if anyone responds. > > > > > > > > It''s a pretty straightforward bug that the kernel does the waiting to > > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > > quick look a little while ago but unfortunately it didn''t look > > > > straightforward to fix on the kernel side :-( I can''t remember why > > > > though. > > > > > > For starter, the protocol requires the messages to sit on the ring for a > > > underdetermined amount of time (boot watches). > > > > > > > It might be simpler to support allowing the userspace client to > > > > explicitly specify a timeout. I''m not sure what the impact on the ring > > > > is of leaving unconsumed requests on the ring when the other end does > > > > show up. Presumably the kernel driver just needs to be prepared to > > > > swallow responses whose target has given up and gone home. > > > > > > No, the simplest thing to do is to use the socket connection > > > exclusively. Just how we''re doing it in XCP and XCI. > > > > Right but this approach doesn''t work with xenstored in a stubdomain. > > Part of the point of using the ring protocol even when this isn''t the > > case is to help ensure that it is possible and help avoid regressions > > etc. > > > > > The protocol is not design to do async either, so leaving unconsumed > > > request, could be pretty disastrous if the other end show up. Providing > > > the kernel doesn''t detect it (i don''t think it does [1]), it would imply > > > spurious reply, for example the previous waiting read on "/abc/def" > > > could reply to a next read on "/xyz/123". > > > > The wire protocol includes a req_id which is echoed in the response > > which sh/could facilitate multiplexing this sort of thing. The pvops > > kernel currently always sets it to zero but that''s just an > > implementation detail ;-) Currently the kernel does (roughly): > > take_lock > > write_request > > wait_for_reply > > release_lock > > instead it should/could be: > > take_lock(timeout) > > write_request (++req_id) > > while read_reply.req_id != req_id && not (timeout) > > wait some more > > release lock > > I prefer a userland solution. Fixing Linux Dom0 doesn''t help NetBSD Dom0.Fixing the NetBSD dom0 does though. Seriously, if kernels are lacking in functionality needed to make the system work smoothly and correctly we should fix them, not just default to adding hacks in userspace because it seems easier in the short term. (Obviously if the userspace solution is the right thing to do and/or more correct in its own right then fine lets do that). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Feb-10 12:23 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thursday 10 February 2011 12:43:47 Ian Campbell wrote:> On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote: > > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > > > On 10/02/11 08:55, Ian Campbell wrote: > > > > > That''s the underlying bug which the heuristic is trying to avoid... > > > > > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > > > someone is listening on the other end so you have no choice but to > > > > > try communicating and see if anyone responds. > > > > > > > > > > It''s a pretty straightforward bug that the kernel does the waiting > > > > > to see if anyone responds bit with an uninterruptible sleep. I took > > > > > a quick look a little while ago but unfortunately it didn''t look > > > > > straightforward to fix on the kernel side :-( I can''t remember why > > > > > though. > > > > > > > > For starter, the protocol requires the messages to sit on the ring > > > > for a underdetermined amount of time (boot watches). > > > > > > > > > It might be simpler to support allowing the userspace client to > > > > > explicitly specify a timeout. I''m not sure what the impact on the > > > > > ring is of leaving unconsumed requests on the ring when the other > > > > > end does show up. Presumably the kernel driver just needs to be > > > > > prepared to swallow responses whose target has given up and gone > > > > > home. > > > > > > > > No, the simplest thing to do is to use the socket connection > > > > exclusively. Just how we''re doing it in XCP and XCI. > > > > > > Right but this approach doesn''t work with xenstored in a stubdomain. > > > Part of the point of using the ring protocol even when this isn''t the > > > case is to help ensure that it is possible and help avoid regressions > > > etc. > > > > > > > The protocol is not design to do async either, so leaving unconsumed > > > > request, could be pretty disastrous if the other end show up. > > > > Providing the kernel doesn''t detect it (i don''t think it does [1]), > > > > it would imply spurious reply, for example the previous waiting read > > > > on "/abc/def" could reply to a next read on "/xyz/123". > > > > > > The wire protocol includes a req_id which is echoed in the response > > > which sh/could facilitate multiplexing this sort of thing. The pvops > > > kernel currently always sets it to zero but that''s just an > > > implementation detail ;-) Currently the kernel does (roughly): > > > take_lock > > > write_request > > > wait_for_reply > > > release_lock > > > instead it should/could be: > > > take_lock(timeout) > > > write_request (++req_id) > > > while read_reply.req_id != req_id && not (timeout) > > > wait some more > > > release lock > > > > I prefer a userland solution. Fixing Linux Dom0 doesn''t help NetBSD Dom0. > > Fixing the NetBSD dom0 does though. > > Seriously, if kernels are lacking in functionality needed to make the > system work smoothly and correctly we should fix them, not just default > to adding hacks in userspace because it seems easier in the short term. > (Obviously if the userspace solution is the right thing to do and/or > more correct in its own right then fine lets do that).Does xl communicate with xenstored through a named socket ? If yes then ''connect()'' should check for ECONNREFUSED. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Feb-10 12:42 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
Christoph Egger writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"):> Does xl communicate with xenstored through a named socket ?Sometimes; or it can use the shared ring. Please see the earlier parts of the thread where Kamala explained why she wants it to use the ring in the usual case. I''m not sure that''s necessary, but _some_ arrangement for making it work with the ring _is_ necessary. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Feb-10 18:30 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thu, 2011-02-10 at 08:55 +0000, Ian Campbell wrote:> On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote: > > On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote: > > > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > > > >>> I''m currently on c/s 22834. Which c/s added the check you are talking > > > > > >>> about? > > > > > >> > > > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > > > > > Right, but did you happen to check how you got past the check done by > > > > > that patch for the case in question? > > > > > > > > The pid file simply doesn''t exist. > > > > > > Oh wait. Hit the ''send'' button too fast. > > > > > > The pid file does exist from previous boot. > > > > Bleh, precisely my problem with these heuristic checks. It''s worse on my > > box because if this happens I end up with unkillable xl processes due to > > libxenstore wanting to open /dev/xen/xenbus or whatever it is. > > That''s the underlying bug which the heuristic is trying to avoid... > > Fundamentally the xs ring protocol is missing any way to tell if someone > is listening on the other end so you have no choice but to try > communicating and see if anyone responds. > > It''s a pretty straightforward bug that the kernel does the waiting to > see if anyone responds bit with an uninterruptible sleep. I took a quick > look a little while ago but unfortunately it didn''t look straightforward > to fix on the kernel side :-( I can''t remember why though.I suppose it''s because we don''t want to be killable after sending the message but before receiving the reply, since the ring is going to get jammed up due to nobody consuming the reply. The reply that in this case never comes, but the kernel can''t know that it won''t eventually come, right? Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Feb-10 19:33 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
Gianni Tedesco writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"):> I suppose it''s because we don''t want to be killable after sending the > message but before receiving the reply, since the ring is going to get > jammed up due to nobody consuming the reply. The reply that in this case > never comes, but the kernel can''t know that it won''t eventually come, > right?The bookkeeping for the fact that there is a command outstanding should take care of that problem - when matching up the replies with requests it will find that the caller for the request has gone away and discard the reply. (Obviously it doesn''t atm because the whole thing is synchronous.) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Hanquez
2011-Feb-10 21:55 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On 10/02/11 11:24, Ian Campbell wrote:> Right but this approach doesn''t work with xenstored in a stubdomain.yeah I know. xenstored in a stubdom is just an experiment, when it become a serious feature, this argument would hold. however it''s not going to be use in 4.1, and in any production settings.> Part of the point of using the ring protocol even when this isn''t the > case is to help ensure that it is possible and help avoid regressions > etc. > >> The protocol is not design to do async either, so leaving unconsumed >> request, could be pretty disastrous if the other end show up. Providing >> the kernel doesn''t detect it (i don''t think it does [1]), it would imply >> spurious reply, for example the previous waiting read on "/abc/def" >> could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that''s just an > implementation detail ;-) Currently the kernel does (roughly):The kernel is not the one exclusively setting the rid. this is a client initialized value. any xs implementation can use it any way they want (including the kernel implementation). Turns out that most of the implementations are actually putting rid to 0 anyway (the ocaml and C implementation are, the windows one isn''t). Even then, if you could initialize it to some value, what value is that going to be ? there''s just no way to know if someone else is not using this rid already globally (since the ring is a global OS thing). Which basically would means tracking pid (the kernel meaning) along with the rid ?>>> Maybe we should add an explicit ping/pong ring message to the xs ring >>> protocol? >> >> And who''s going to reply to this if xenstored is missing ? you would >> require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-)ah right, the famous DeusExMachina message type then :-) -- Vincent _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-11 08:03 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On Thu, 2011-02-10 at 21:55 +0000, Vincent Hanquez wrote:> On 10/02/11 11:24, Ian Campbell wrote: > > Right but this approach doesn''t work with xenstored in a stubdomain. > > yeah I know. xenstored in a stubdom is just an experiment, when it > become a serious feature, this argument would hold. however it''s not > going to be use in 4.1, and in any production settings.Accepted. As I understand it people are actively using stub-xenstored in disaggregation research today. Reinvigorating the stub domain approach for Xen is also (going to be) one of our GSoC proposals this year.> > Part of the point of using the ring protocol even when this isn''t the > > case is to help ensure that it is possible and help avoid regressions > > etc. > > > >> The protocol is not design to do async either, so leaving unconsumed > >> request, could be pretty disastrous if the other end show up. Providing > >> the kernel doesn''t detect it (i don''t think it does [1]), it would imply > >> spurious reply, for example the previous waiting read on "/abc/def" > >> could reply to a next read on "/xyz/123". > > > > The wire protocol includes a req_id which is echoed in the response > > which sh/could facilitate multiplexing this sort of thing. The pvops > > kernel currently always sets it to zero but that''s just an > > implementation detail ;-) Currently the kernel does (roughly): > > The kernel is not the one exclusively setting the rid. this is a client > initialized value. any xs implementation can use it any way they want > (including the kernel implementation). > > Turns out that most of the implementations are actually putting rid to 0 > anyway (the ocaml and C implementation are, the windows one isn''t). > > Even then, if you could initialize it to some value, what value is that > going to be ? there''s just no way to know if someone else is not using > this rid already globally (since the ring is a global OS thing). Which > basically would means tracking pid (the kernel meaning) along with the rid ?Since the kernel mediates all access to the actual ring it can handle the req_id with a single incrementing integer and fake out whatever is necessary to its users. It''s trivial to fixup the in-kernel xs users. Most likely it only involves changing a single function in the core xs kernel code which everyone else must use anyway. For userspace users it''s a little trickier but the kernel just needs to remember the userspace supplied req_id before inserting its own and to reverse the substitution in the reply. If you were to support multiple outstanding active requests then it would be natural to stash the id in whatever data structure you were using for that. If you only want to simplify by only supporting a single active request (by throwing away responses to aborted/timed out requests as I suggested earlier) you only need to remember the user provided req_id for the one request which is trivial. AFAIK we don''t have any kernel code which does clever things such as using a pointer to a datastructure as the req_id but if we did (and we were unwilling to simply change it) then the userspace solution would work there too.> >>> Maybe we should add an explicit ping/pong ring message to the xs ring > >>> protocol? > >> > >> And who''s going to reply to this if xenstored is missing ? you would > >> require the kernel to introspect the messages and reply by itself. > > > > The reason I suggested new messages was that I would solve that by > > declaring that these new messages have whatever magic semantics I need > > to make this work ;-) > > ah right, the famous DeusExMachina message type then :-):-) Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Feb-11 09:49 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote:> On 10/02/11 11:24, Ian Campbell wrote: > > Right but this approach doesn''t work with xenstored in a stubdomain. > > yeah I know. xenstored in a stubdom is just an experiment, when it > become a serious feature, this argument would hold. however it''s not > going to be use in 4.1, and in any production settings.You seem to be arguing that we shouldn''t fix a bug in the kernel. I don''t understand that. How is it going to become a "serious feature" if we don''t fix the bugs that affect it? In any case, I can think of three projects off the top of my head that are using stub domains aggressively, one of which I know is using xenstore stub domains in particular. I''m sure there are others. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Hanquez
2011-Feb-11 11:16 UTC
Re: [Xen-devel] libxl: error handling before xenstored runs
On 11/02/11 09:49, Tim Deegan wrote:> At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote: >> On 10/02/11 11:24, Ian Campbell wrote: >>> Right but this approach doesn''t work with xenstored in a stubdomain. >> >> yeah I know. xenstored in a stubdom is just an experiment, when it >> become a serious feature, this argument would hold. however it''s not >> going to be use in 4.1, and in any production settings. > > You seem to be arguing that we shouldn''t fix a bug in the kernel. I > don''t understand that. How is it going to become a "serious feature" if > we don''t fix the bugs that affect it?If you want to fix the behaviour which is present since xen 3.0 (you can understand why i''m not holding my breath anymore), all the best. What I''m arguing is that behaviour should not exists in the next stable version of xen, specially for a feature that is not serious yet (at least upstream).> In any case, I can think of three projects off the top of my head that > are using stub domains aggressively, one of which I know is using > xenstore stub domains in particular. I''m sure there are others.Good for them. I''m sure they can carry a 2 liner patch to connect to the ring instead of the unix socket, until they actually submit upstream their stuff that make xenstored stubdomain great. -- Vincent _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel