Changeset 11831:f5321161c649 has broken non-Linux domUs with this change: devid = blkif.blkdev_name_to_number(dev) + if not devid: + raise VmError(''Unable to find number for device (%s)'' % (dev)) + The immediate problem is that Solaris domU''s have "0" for dev for the first disk. So it''s presumably matched on the hex re in util/blkif.py, returning 0 and failing this incorrect check. There are other problems: 1) util/blkif.py logs to xend-debug.log if the stat() fails. This is needlessly chatty, and indicates there''s some kind of error, when there is not. 2) util/blkif.py has a load of Linux gook for getting the device numbers. Luckily Solaris has a completely different naming scheme, but wouldn''t this go horribly wrong if a domU just happened to use the same name, different device number? It''s not clear to us why Linux even needs to do this? For now I think the change needs backing out so non-Linux domU''s can work again. I''m not sure of a better fix; suggestions welcome. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1-Nov-06, at 5:42 PM, John Levon wrote:> > Changeset 11831:f5321161c649 has broken non-Linux domUs with this > change: > > devid = blkif.blkdev_name_to_number(dev) > + if not devid: > + raise VmError(''Unable to find number for device (%s)'' > % (dev)) > + > > The immediate problem is that Solaris domU''s have "0" for dev for the > first disk. So it''s presumably matched on the hex re in util/blkif.py, > returning 0 and failing this incorrect check. There are other > problems:I don''t know about the other stuff, but changing the check to if devid is None: should solve your immediate problem, right? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 01, 2006 at 06:03:03PM -0800, Brendan Cully wrote:> >The immediate problem is that Solaris domU''s have "0" for dev for the > >first disk. So it''s presumably matched on the hex re in util/blkif.py, > >returning 0 and failing this incorrect check. There are other > >problems: > > I don''t know about the other stuff, but changing the check to > > if devid is None: > > should solve your immediate problem, right?Yes, but only by chance. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 02, 2006 at 01:42:50AM +0000, John Levon wrote:> > Changeset 11831:f5321161c649 has broken non-Linux domUs with this > change: > > devid = blkif.blkdev_name_to_number(dev) > + if not devid: > + raise VmError(''Unable to find number for device (%s)'' % (dev)) > + > > The immediate problem is that Solaris domU''s have "0" for dev for the > first disk. So it''s presumably matched on the hex re in util/blkif.py, > returning 0 and failing this incorrect check. There are other problems: > > 1) util/blkif.py logs to xend-debug.log if the stat() fails. This is > needlessly chatty, and indicates there''s some kind of error, when there > is not. > > 2) util/blkif.py has a load of Linux gook for getting the device > numbers. Luckily Solaris has a completely different naming scheme, but > wouldn''t this go horribly wrong if a domU just happened to use the same > name, different device number? > > It''s not clear to us why Linux even needs to do this? > > For now I think the change needs backing out so non-Linux domU''s can > work again. I''m not sure of a better fix; suggestions welcome.I think that the correct fix would be for the tools to pass the untranslated device name into the guest, rather than translating it to a device number first. I''ve no idea why this was done in the first place, as it''s clearly wrong. Like you say, there''s no reason for a guest''s device name -> number mapping to be the same as dom0''s. Unfortunately, this is part of the guaranteed interface to guests now, so we need to reproduce this behaviour for old guests, but there''s nothing stopping us fixing this for new guests. If we fixed the tools to write the device name as well as the (Linux) device number then new guests could use the name rather than the number and do the lookup themselves. In this scheme, the check above would go -- the failure to look up the device would be handled simply by writing the name and not the number, and hoping that it''s not an old Linux guest. The change was intended to improve the error message that you receive in this case, so at the least, the failure ought to be logged (unless you can come up with some way to detect old Linux guests, and only complain in that case). Would you like to put together a patch along these lines? Thanks, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 02, 2006 at 11:21:17PM +0000, Ewan Mellor wrote:> > 1) util/blkif.py logs to xend-debug.log if the stat() fails. This is > > needlessly chatty, and indicates there''s some kind of error, when there > > is not. > > > > 2) util/blkif.py has a load of Linux gook for getting the device > > numbers. Luckily Solaris has a completely different naming scheme, but > > wouldn''t this go horribly wrong if a domU just happened to use the same > > name, different device number? > > > > It''s not clear to us why Linux even needs to do this? > > I think that the correct fix would be for the tools to pass the untranslated > device name into the guest, rather than translating it to a device numberSounds sensible to me.> that it''s not an old Linux guest. The change was intended to improve the > error message that you receive in this case, so at the least, the failure > ought to be logged (unless you can come up with some way to detect old Linux > guests, and only complain in that case).Is there some other way to indicate the failure later? We''d like xend-debug.log to be essentially silent during normal operation for a non-debug xend...> Would you like to put together a patch along these lines?I can do a patch for xend, but I''m not familiar enough to update the Linux side of things. I see that the ''is none'' hack has been committed along with removing the message in blkif.py, so that solves the immediate issue for us. thanks, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 02, 2006 at 11:38:04PM +0000, John Levon wrote:> On Thu, Nov 02, 2006 at 11:21:17PM +0000, Ewan Mellor wrote: > > > > 1) util/blkif.py logs to xend-debug.log if the stat() fails. This is > > > needlessly chatty, and indicates there''s some kind of error, when there > > > is not. > > > > > > 2) util/blkif.py has a load of Linux gook for getting the device > > > numbers. Luckily Solaris has a completely different naming scheme, but > > > wouldn''t this go horribly wrong if a domU just happened to use the same > > > name, different device number? > > > > > > It''s not clear to us why Linux even needs to do this? > > > > I think that the correct fix would be for the tools to pass the untranslated > > device name into the guest, rather than translating it to a device number > > Sounds sensible to me. > > > that it''s not an old Linux guest. The change was intended to improve the > > error message that you receive in this case, so at the least, the failure > > ought to be logged (unless you can come up with some way to detect old Linux > > guests, and only complain in that case). > > Is there some other way to indicate the failure later? We''d like > xend-debug.log to be essentially silent during normal operation for a > non-debug xend...I meant log it to xend.log (the log infrastructure), as opposed to xend-debug.log (Xend''s stderr) if you were making that distinction. Certainly we could indicate the failure later, though it''s a little complicated. Of course, the error can only be detected by the guest, so you''ll have to make blkfront or the equivalent in Solaris write an error code to the store, and then pick that up again from Xend. This has been done to a certain extent already, but the problem is that, by this point, xm has returned success, so though the error has been flagged, no-one gets to see it, other than diagnostic tools, and it''s not long before the device teardown occurs and the error message is deleted then anyway. We would need to grab the error code in Xend, before the device teardown, and then because there''s no client waiting at this point, the only thing we could do is log it anyway. Alternatively, we could extend the "wait-for-devices" functionality in the xm create path to wait for an indication of successful device set-up (at the moment, we only wait for successful hotplugging in dom0). In that case, you would actually have a client to send the error message to. Which would be nice.> > Would you like to put together a patch along these lines? > > I can do a patch for xend, but I''m not familiar enough to update the > Linux side of things.That''s fine -- if you can make it work for Solaris without breaking the existing functionality, we can move Linux over to the new scheme at a later date. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Nov 03, 2006 at 12:03:04AM +0000, Ewan Mellor wrote:> I meant log it to xend.log (the log infrastructure), as opposed to > xend-debug.log (Xend''s stderr) if you were making that distinction.Well, I suppose that''s a bit better.> Certainly we could indicate the failure later, though it''s a little > complicated. Of course, the error can only be detected by the guest, so > you''ll have to make blkfront or the equivalent in Solaris write an error code > to the store, and then pick that up again from Xend. This has been done to a > certain extent already, but the problem is that, by this point, xm has > returned success, so though the error has been flagged, no-one gets to see it, > other than diagnostic tools, and it''s not long before the device teardown > occurs and the error message is deleted then anyway. > > We would need to grab the error code in Xend, before the device teardown, and > then because there''s no client waiting at this point, the only thing we could > do is log it anyway. Alternatively, we could extend the "wait-for-devices" > functionality in the xm create path to wait for an indication of successful > device set-up (at the moment, we only wait for successful hotplugging in > dom0). In that case, you would actually have a client to send the error > message to. Which would be nice.Presumably, one day, these sorts of errors will be forwardable to something watching what''s going via xen-api. At least, that would be nice. BTW, it''d be great if one of you could do a quick write-up on the current code in xen-unstable? There''s a heck of a lot of changes just gone in, and it''d be nice to know what state things are supposed to be in: what works, what doesn''t, what needs improving. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel