Hi, Now that libxl + qemu''s built-in disk backend for ceph/rbd is working nicely, I''m trying to get it all working through libvirt. When libvirt''s libxl driver creates a libxl_device_disk, it applies a few simple rules[1] covering specific file format types (qcow, qcow2, vhd). If none of these rules apply then it defers to libxl''s best guess: * If driverName is not specified, default to raw as per * xl-disk-configuration.txt in the xen documentation and let * libxl pick a suitable backend. */ x_disk->format = LIBXL_DISK_FORMAT_RAW; x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; The libxl code in libxl__device_disk_set_backend[2] tries to resolve ''UNKNOWN'' into something concrete by calling ''disk_try_backend'': disk_try_backend(&a, LIBXL_DISK_BACKEND_PHY) ?: disk_try_backend(&a, LIBXL_DISK_BACKEND_TAP) ?: disk_try_backend(&a, LIBXL_DISK_BACKEND_QDISK); Unfortunately for me and my quest, the case for LIBXL_DISK_BACKEND_TAP just checks for * lack of hotplug script * libxl__blktap_enabled * DISK_FORMAT_{RAW,VHD} and so it selects TAP and then fails, because tapdisk doesn''t know anything about this disk access protocol (yet at least). Unbeknownst to libxl, qemu would be able to handle the disk in this case. I''m not sure what the best way to address this is. I could disable blktap on my system but it would be a shame to drop support for the stuff tapdisk does well (like .vhd handling) On the other hand, having both tapdisk and qemu means having to choose between them for disk protocols they could both handle. A choice would either have to be coded in libxl (or libvirt''s libxl driver), which seems like a bit of an annoying maintenance burden (i.e. update the list every time someone adds a driver somewhere); or it could be left to the sysadmin to choose via some kind of resolver script, which seems a bit complex. A third possibility is to be able to ask tapdisk, "do you actually support this disk" and if yes, use it in preference to qemu, if no fall back to qemu. Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-) Cheers, Dave [1] http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libxl/libxl_conf.c;h=7e0753a69ed4a0867d14843cbad20d1430a67f0c;hb=HEAD#l462 [2] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_device.c;h=eeea9d9a023877a69ff2832c973e753629333356;hb=HEAD#l219
On Wed, 24 Apr 2013, David Scott wrote:> Hi, > > Now that libxl + qemu''s built-in disk backend for ceph/rbd is working > nicely, I''m trying to get it all working through libvirt. > > When libvirt''s libxl driver creates a libxl_device_disk, it applies a > few simple rules[1] covering specific file format types (qcow, qcow2, > vhd). If none of these rules apply then it defers to libxl''s best guess: > > * If driverName is not specified, default to raw as per > * xl-disk-configuration.txt in the xen documentation and let > * libxl pick a suitable backend. > */ > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; > > The libxl code in libxl__device_disk_set_backend[2] tries to resolve > ''UNKNOWN'' into something concrete by calling ''disk_try_backend'': > > disk_try_backend(&a, LIBXL_DISK_BACKEND_PHY) ?: > disk_try_backend(&a, LIBXL_DISK_BACKEND_TAP) ?: > disk_try_backend(&a, LIBXL_DISK_BACKEND_QDISK); > > Unfortunately for me and my quest, the case for LIBXL_DISK_BACKEND_TAP > just checks for > * lack of hotplug script > * libxl__blktap_enabled > * DISK_FORMAT_{RAW,VHD} > and so it selects TAP and then fails, because tapdisk doesn''t know > anything about this disk access protocol (yet at least). Unbeknownst to > libxl, qemu would be able to handle the disk in this case. > > I''m not sure what the best way to address this is. I could disable > blktap on my system but it would be a shame to drop support for the > stuff tapdisk does well (like .vhd handling) > > On the other hand, having both tapdisk and qemu means having to choose > between them for disk protocols they could both handle. A choice would > either have to be coded in libxl (or libvirt''s libxl driver), which > seems like a bit of an annoying maintenance burden (i.e. update the list > every time someone adds a driver somewhere); or it could be left to the > sysadmin to choose via some kind of resolver script, which seems a bit > complex. > > A third possibility is to be able to ask tapdisk, "do you actually > support this disk" and if yes, use it in preference to qemu, if no fall > back to qemu. > > Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-)I vote for using blktap exclusively for VHD files. In fact we do know that blktap support is not great for anything else anyway (qcow2 I am thinking about you).
On Wed, 2013-04-24 at 15:22 +0100, David Scott wrote:> Hi, > > Now that libxl + qemu''s built-in disk backend for ceph/rbd is working > nicely, I''m trying to get it all working through libvirt. > > When libvirt''s libxl driver creates a libxl_device_disk, it applies a > few simple rules[1] covering specific file format types (qcow, qcow2, > vhd). If none of these rules apply then it defers to libxl''s best guess: > > * If driverName is not specified, default to raw as per > * xl-disk-configuration.txt in the xen documentation and let > * libxl pick a suitable backend.Can you specify driverName in your config somehow? I confess I don''t know if this is ~= format or backend...> Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-)This is not the first time that the issue of hardcoded defaults in libxl has come up recently (in fact I think you mentioned one of the others to me...). It seems to me that there is a need arising for a system wide libxl configuration file which sets these sorts of defaults for all users of libxl on that system and which can be modified according to admin preference. As well as the disk backend selection (which is actually probably pretty complex to express in a simple configuration file) this has come up in the context of stubdomain usage and autoballooning. Ian.
On Thu, Apr 25, 2013 at 9:55 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Wed, 2013-04-24 at 15:22 +0100, David Scott wrote: >> Hi, >> >> Now that libxl + qemu''s built-in disk backend for ceph/rbd is working >> nicely, I''m trying to get it all working through libvirt. >> >> When libvirt''s libxl driver creates a libxl_device_disk, it applies a >> few simple rules[1] covering specific file format types (qcow, qcow2, >> vhd). If none of these rules apply then it defers to libxl''s best guess: >> >> * If driverName is not specified, default to raw as per >> * xl-disk-configuration.txt in the xen documentation and let >> * libxl pick a suitable backend. > > Can you specify driverName in your config somehow? I confess I don''t > know if this is ~= format or backend... > >> Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-) > > This is not the first time that the issue of hardcoded defaults in libxl > has come up recently (in fact I think you mentioned one of the others to > me...). > > It seems to me that there is a need arising for a system wide libxl > configuration file which sets these sorts of defaults for all users of > libxl on that system and which can be modified according to admin > preference. As well as the disk backend selection (which is actually > probably pretty complex to express in a simple configuration file) this > has come up in the context of stubdomain usage and autoballooning.Something like this was actually on my "to-implement" feature list for 4.3, but no one got around to it, so I dropped it. FWIW, I think having this kind of thing would make 4.3 betterer enough that it would be worth slipping the release for a few weeks to get it in. On the other hand, experience has shown that converging on what the "right" interface should be take a heck of a lot longer than you think; so maybe we should just plan on doing this for 4.4 at this point. -George
On Thu, 2013-04-25 at 11:36 +0100, George Dunlap wrote:> On Thu, Apr 25, 2013 at 9:55 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Wed, 2013-04-24 at 15:22 +0100, David Scott wrote: > >> Hi, > >> > >> Now that libxl + qemu''s built-in disk backend for ceph/rbd is working > >> nicely, I''m trying to get it all working through libvirt. > >> > >> When libvirt''s libxl driver creates a libxl_device_disk, it applies a > >> few simple rules[1] covering specific file format types (qcow, qcow2, > >> vhd). If none of these rules apply then it defers to libxl''s best guess: > >> > >> * If driverName is not specified, default to raw as per > >> * xl-disk-configuration.txt in the xen documentation and let > >> * libxl pick a suitable backend. > > > > Can you specify driverName in your config somehow? I confess I don''t > > know if this is ~= format or backend... > > > >> Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-) > > > > This is not the first time that the issue of hardcoded defaults in libxl > > has come up recently (in fact I think you mentioned one of the others to > > me...). > > > > It seems to me that there is a need arising for a system wide libxl > > configuration file which sets these sorts of defaults for all users of > > libxl on that system and which can be modified according to admin > > preference. As well as the disk backend selection (which is actually > > probably pretty complex to express in a simple configuration file) this > > has come up in the context of stubdomain usage and autoballooning. > > Something like this was actually on my "to-implement" feature list for > 4.3, but no one got around to it, so I dropped it. > > FWIW, I think having this kind of thing would make 4.3 betterer enough > that it would be worth slipping the release for a few weeks to get it > in. > > On the other hand, experience has shown that converging on what the > "right" interface should be take a heck of a lot longer than you > think; so maybe we should just plan on doing this for 4.4 at this > point.I don''t think this can/should be rushed at this point. Ian.
On Thu, 25 Apr 2013, Ian Campbell wrote:> On Thu, 2013-04-25 at 11:36 +0100, George Dunlap wrote: > > On Thu, Apr 25, 2013 at 9:55 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Wed, 2013-04-24 at 15:22 +0100, David Scott wrote: > > >> Hi, > > >> > > >> Now that libxl + qemu''s built-in disk backend for ceph/rbd is working > > >> nicely, I''m trying to get it all working through libvirt. > > >> > > >> When libvirt''s libxl driver creates a libxl_device_disk, it applies a > > >> few simple rules[1] covering specific file format types (qcow, qcow2, > > >> vhd). If none of these rules apply then it defers to libxl''s best guess: > > >> > > >> * If driverName is not specified, default to raw as per > > >> * xl-disk-configuration.txt in the xen documentation and let > > >> * libxl pick a suitable backend. > > > > > > Can you specify driverName in your config somehow? I confess I don''t > > > know if this is ~= format or backend... > > > > > >> Anyone got any thoughts? (Or perhaps I''ve missed something obvious! :-) > > > > > > This is not the first time that the issue of hardcoded defaults in libxl > > > has come up recently (in fact I think you mentioned one of the others to > > > me...). > > > > > > It seems to me that there is a need arising for a system wide libxl > > > configuration file which sets these sorts of defaults for all users of > > > libxl on that system and which can be modified according to admin > > > preference. As well as the disk backend selection (which is actually > > > probably pretty complex to express in a simple configuration file) this > > > has come up in the context of stubdomain usage and autoballooning. > > > > Something like this was actually on my "to-implement" feature list for > > 4.3, but no one got around to it, so I dropped it. > > > > FWIW, I think having this kind of thing would make 4.3 betterer enough > > that it would be worth slipping the release for a few weeks to get it > > in. > > > > On the other hand, experience has shown that converging on what the > > "right" interface should be take a heck of a lot longer than you > > think; so maybe we should just plan on doing this for 4.4 at this > > point. > > I don''t think this can/should be rushed at this point.Right, but going back to the original question: if we have in our hands something that is not vhd, raw or qcow and blktap2 is available, should we really try to pass it to it? We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the implementation of the first two formats is too buggy to be used). Thus I think that the answer is pretty obvious here: we should try with QEMU, whose supported format list is way wider.
On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote:> Right, but going back to the original question: if we have in our hands > something that is not vhd, raw or qcow and blktap2 is available, > should we really try to pass it to it?That''s not quite the original question because in that case it was raw, at least as far as the libxl interface is concerned.> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the > implementation of the first two formats is too buggy to be used). > > Thus I think that the answer is pretty obvious here: we should try with > QEMU, whose supported format list is way wider.Which I think we do, we only try and use blktap for raw and vhd. Ian.
On 25/04/13 12:57, Ian Campbell wrote:> On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: > >> Right, but going back to the original question: if we have in our hands >> something that is not vhd, raw or qcow and blktap2 is available, >> should we really try to pass it to it? > > That''s not quite the original question because in that case it was raw, > at least as far as the libxl interface is concerned. > >> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the >> implementation of the first two formats is too buggy to be used). >> >> Thus I think that the answer is pretty obvious here: we should try with >> QEMU, whose supported format list is way wider. > > Which I think we do, we only try and use blktap for raw and vhd.As well as the disk format dimension (vhd, qcow2 etc) there''s also the network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m showing my bias here ;-) disk access protocols are in qemu. So if we default to blktap/tapdisk we can only handle raw *files*, whereas if we default to qemu we can also do these new fancy things as well. Cheers, Dave
On Thu, 2013-04-25 at 13:12 +0100, Dave Scott wrote:> On 25/04/13 12:57, Ian Campbell wrote: > > On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: > > > >> Right, but going back to the original question: if we have in our hands > >> something that is not vhd, raw or qcow and blktap2 is available, > >> should we really try to pass it to it? > > > > That''s not quite the original question because in that case it was raw, > > at least as far as the libxl interface is concerned. > > > >> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the > >> implementation of the first two formats is too buggy to be used). > >> > >> Thus I think that the answer is pretty obvious here: we should try with > >> QEMU, whose supported format list is way wider. > > > > Which I think we do, we only try and use blktap for raw and vhd. > > As well as the disk format dimension (vhd, qcow2 etc) there''s also the > network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both > blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m > showing my bias here ;-) disk access protocols are in qemu. So if we > default to blktap/tapdisk we can only handle raw *files*, whereas if we > default to qemu we can also do these new fancy things as well.Remind me why you can''t specify backend=qdisk explicitly? Does libvirt not propagate anything like that? Ian
On 25/04/13 13:43, Ian Campbell wrote:> On Thu, 2013-04-25 at 13:12 +0100, Dave Scott wrote: >> On 25/04/13 12:57, Ian Campbell wrote: >>> On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: >>> >>>> Right, but going back to the original question: if we have in our hands >>>> something that is not vhd, raw or qcow and blktap2 is available, >>>> should we really try to pass it to it? >>> >>> That''s not quite the original question because in that case it was raw, >>> at least as far as the libxl interface is concerned. >>> >>>> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the >>>> implementation of the first two formats is too buggy to be used). >>>> >>>> Thus I think that the answer is pretty obvious here: we should try with >>>> QEMU, whose supported format list is way wider. >>> >>> Which I think we do, we only try and use blktap for raw and vhd. >> >> As well as the disk format dimension (vhd, qcow2 etc) there''s also the >> network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both >> blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m >> showing my bias here ;-) disk access protocols are in qemu. So if we >> default to blktap/tapdisk we can only handle raw *files*, whereas if we >> default to qemu we can also do these new fancy things as well. > > Remind me why you can''t specify backend=qdisk explicitly? Does libvirt > not propagate anything like that?I''ll give this a go tomorrow when I''m not at the mercy of dodgy wifi :-) From my reading of the libvirt code it does allow the selection of a driverName from the set "tap" "tap2" "file" and "phy". It looks like we could define a "qdisk" driverName. Actually this reminds me of a general concern I had: I worry that people who choose to use libvirt will likely use all the default options, and not customise their code too much for a specific hypervisor. After all, if they wanted to customise their code for xen they would just use libxl and use the richer interface. I think we ought to make sure as much stuff just works by default as possible. In my ideal world it would be possible to take a simple libvirt domain configuration (perhaps one which used to run on kvm) and run it on xen, transparently benefiting from xen features such as driver domains (and automatic blktap/qemu selection?). I think if the domain XML requires too much tweaking then people will just not use the features or may give up altogether. Cheers, Dave
On Thu, 2013-04-25 at 14:22 +0100, Dave Scott wrote:> On 25/04/13 13:43, Ian Campbell wrote: > > On Thu, 2013-04-25 at 13:12 +0100, Dave Scott wrote: > >> On 25/04/13 12:57, Ian Campbell wrote: > >>> On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: > >>> > >>>> Right, but going back to the original question: if we have in our hands > >>>> something that is not vhd, raw or qcow and blktap2 is available, > >>>> should we really try to pass it to it? > >>> > >>> That''s not quite the original question because in that case it was raw, > >>> at least as far as the libxl interface is concerned. > >>> > >>>> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the > >>>> implementation of the first two formats is too buggy to be used). > >>>> > >>>> Thus I think that the answer is pretty obvious here: we should try with > >>>> QEMU, whose supported format list is way wider. > >>> > >>> Which I think we do, we only try and use blktap for raw and vhd. > >> > >> As well as the disk format dimension (vhd, qcow2 etc) there''s also the > >> network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both > >> blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m > >> showing my bias here ;-) disk access protocols are in qemu. So if we > >> default to blktap/tapdisk we can only handle raw *files*, whereas if we > >> default to qemu we can also do these new fancy things as well. > > > > Remind me why you can''t specify backend=qdisk explicitly? Does libvirt > > not propagate anything like that? > > I''ll give this a go tomorrow when I''m not at the mercy of dodgy wifi :-) > From my reading of the libvirt code it does allow the selection of a > driverName from the set "tap" "tap2" "file" and "phy".In libxl the choices are phy, tap and qdisk. There''s no file nor any distinction between tap and tap2. I expect the libxl libvirt driver is trying to make things easier for people migrating from xend (which is to be applauded).> It looks like we could define a "qdisk" driverName.Yes, I think this would work. WRT to your comment below it looks like other libvirt drivers call this driver "qemu".> Actually this reminds me of a general concern I had: I worry that people > who choose to use libvirt will likely use all the default options, and > not customise their code too much for a specific hypervisor. After all, > if they wanted to customise their code for xen they would just use libxl > and use the richer interface. I think we ought to make sure as much > stuff just works by default as possible. In my ideal world it would be > possible to take a simple libvirt domain configuration (perhaps one > which used to run on kvm) and run it on xen, transparently benefiting > from xen features such as driver domains (and automatic blktap/qemu > selection?). I think if the domain XML requires too much tweaking then > people will just not use the features or may give up altogether.This would be ideal, I seem to remember hearing that disk configuration was something which libvirt didn''t abstract, tending to just expose underlying configuration strings of the underlying toolstack. So this is one area where you actually have to change things when you migrate between hypervisors or they won''t work at all. I may be talking rubbish or be years out of date with that though. Ian.
Hi,> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the > implementation of the first two formats is too buggy to be used).Well, jfyi I have a working RBD driver for blktap2 ... Cheers, Sylvain
Ian Campbell wrote:> On Thu, 2013-04-25 at 14:22 +0100, Dave Scott wrote: > >> On 25/04/13 13:43, Ian Campbell wrote: >> >>> On Thu, 2013-04-25 at 13:12 +0100, Dave Scott wrote: >>> >>>> On 25/04/13 12:57, Ian Campbell wrote: >>>> >>>>> On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: >>>>> >>>>> >>>>>> Right, but going back to the original question: if we have in our hands >>>>>> something that is not vhd, raw or qcow and blktap2 is available, >>>>>> should we really try to pass it to it? >>>>>> >>>>> That''s not quite the original question because in that case it was raw, >>>>> at least as far as the libxl interface is concerned. >>>>> >>>>> >>>>>> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the >>>>>> implementation of the first two formats is too buggy to be used). >>>>>> >>>>>> Thus I think that the answer is pretty obvious here: we should try with >>>>>> QEMU, whose supported format list is way wider. >>>>>> >>>>> Which I think we do, we only try and use blktap for raw and vhd. >>>>> >>>> As well as the disk format dimension (vhd, qcow2 etc) there''s also the >>>> network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both >>>> blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m >>>> showing my bias here ;-) disk access protocols are in qemu. So if we >>>> default to blktap/tapdisk we can only handle raw *files*, whereas if we >>>> default to qemu we can also do these new fancy things as well. >>>> >>> Remind me why you can''t specify backend=qdisk explicitly? Does libvirt >>> not propagate anything like that? >>> >> I''ll give this a go tomorrow when I''m not at the mercy of dodgy wifi :-) >> From my reading of the libvirt code it does allow the selection of a >> driverName from the set "tap" "tap2" "file" and "phy". >> > > In libxl the choices are phy, tap and qdisk. There''s no file nor any > distinction between tap and tap2. I expect the libxl libvirt driver is > trying to make things easier for people migrating from xend (which is > to be applauded). >Right. ''file'' is mapped to backend ''tap'' and format ''raw''. In fact, I should just post the code here for review :). For a bit of context, l_disk is libvirt''s definition, and x_disk is libxl''s. Comments on this mapping welcome, keeping in mind Ian''s point about migrating from xend. if (l_disk->driverName) { if (STREQ(l_disk->driverName, "tap") || STREQ(l_disk->driverName, "tap2")) { switch (l_disk->format) { case VIR_STORAGE_FILE_QCOW: x_disk->format = LIBXL_DISK_FORMAT_QCOW; x_disk->backend = LIBXL_DISK_BACKEND_QDISK; break; case VIR_STORAGE_FILE_QCOW2: x_disk->format = LIBXL_DISK_FORMAT_QCOW2; x_disk->backend = LIBXL_DISK_BACKEND_QDISK; break; case VIR_STORAGE_FILE_VHD: x_disk->format = LIBXL_DISK_FORMAT_VHD; x_disk->backend = LIBXL_DISK_BACKEND_TAP; break; case VIR_STORAGE_FILE_NONE: /* No subtype specified, default to raw/tap */ case VIR_STORAGE_FILE_RAW: x_disk->format = LIBXL_DISK_FORMAT_RAW; x_disk->backend = LIBXL_DISK_BACKEND_TAP; break; default: virReportError(VIR_ERR_INTERNAL_ERROR, _("libxenlight does not support disk driver %s"), virStorageFileFormatTypeToString(l_disk->format)); return -1; } } else if (STREQ(l_disk->driverName, "file")) { x_disk->format = LIBXL_DISK_FORMAT_RAW; x_disk->backend = LIBXL_DISK_BACKEND_TAP; } else if (STREQ(l_disk->driverName, "phy")) { x_disk->format = LIBXL_DISK_FORMAT_RAW; x_disk->backend = LIBXL_DISK_BACKEND_PHY; } else { virReportError(VIR_ERR_INTERNAL_ERROR, _("libxenlight does not support disk driver %s"), l_disk->driverName); return -1; } } else { /* * If driverName is not specified, default to raw as per * xl-disk-configuration.txt in the xen documentation and let * libxl pick a suitable backend. */ x_disk->format = LIBXL_DISK_FORMAT_RAW; x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; }> >> It looks like we could define a "qdisk" driverName. >> > > Yes, I think this would work. WRT to your comment below it looks like > other libvirt drivers call this driver "qemu". >IMO, we should just extend the above to map driverName ''qemu'' to backend ''qdisk''. But what about the formats? I though qdisk could handle all of them, particularly with qemu-upstream, even vmdk?> >> Actually this reminds me of a general concern I had: I worry that people >> who choose to use libvirt will likely use all the default options, and >> not customise their code too much for a specific hypervisor. After all, >> if they wanted to customise their code for xen they would just use libxl >> and use the richer interface. I think we ought to make sure as much >> stuff just works by default as possible. In my ideal world it would be >> possible to take a simple libvirt domain configuration (perhaps one >> which used to run on kvm) and run it on xen, transparently benefiting >> from xen features such as driver domains (and automatic blktap/qemu >> selection?). I think if the domain XML requires too much tweaking then >> people will just not use the features or may give up altogether. >> > > This would be ideal, I seem to remember hearing that disk configuration > was something which libvirt didn''t abstract, tending to just expose > underlying configuration strings of the underlying toolstack. So this is > one area where you actually have to change things when you migrate > between hypervisors or they won''t work at all. >For the most part, that is true, unless you were using a minimal configuration and letting the individual hypervisor driver choose defaults. It would probably work with raw images and no driver element defined, e.g. <disk type=''file'' device=''disk''> <source file=''/path/to/disk.raw''/> <target dev=''hda'' bus=''ide/> </disk> Regards, Jim
Jim Fehlig wrote:> Ian Campbell wrote: > >> On Thu, 2013-04-25 at 14:22 +0100, Dave Scott wrote: >> >> >>> On 25/04/13 13:43, Ian Campbell wrote: >>> >>> >> >> >>> It looks like we could define a "qdisk" driverName. >>> >>> >> Yes, I think this would work. WRT to your comment below it looks like >> other libvirt drivers call this driver "qemu". >> >> > > IMO, we should just extend the above to map driverName ''qemu'' to backend > ''qdisk''. But what about the formats? I though qdisk could handle all > of them, particularly with qemu-upstream, even vmdk? >Something like the attached, which seems to work well for me when specifying driverName = qemu, e.g. <disk type=''file'' device=''disk''> <driver name=''qemu''/> <source file=''/var/lib/xen/images/sles11sp2-pv/disk0.raw''/> <target dev=''xvda'' bus=''xen''/> </disk> Regards, Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Fri, 2013-04-26 at 02:27 +0100, Jim Fehlig wrote:> Ian Campbell wrote: > > On Thu, 2013-04-25 at 14:22 +0100, Dave Scott wrote: > > > >> On 25/04/13 13:43, Ian Campbell wrote: > >> > >>> On Thu, 2013-04-25 at 13:12 +0100, Dave Scott wrote: > >>> > >>>> On 25/04/13 12:57, Ian Campbell wrote: > >>>> > >>>>> On Thu, 2013-04-25 at 12:55 +0100, Stefano Stabellini wrote: > >>>>> > >>>>> > >>>>>> Right, but going back to the original question: if we have in our hands > >>>>>> something that is not vhd, raw or qcow and blktap2 is available, > >>>>>> should we really try to pass it to it? > >>>>>> > >>>>> That''s not quite the original question because in that case it was raw, > >>>>> at least as far as the libxl interface is concerned. > >>>>> > >>>>> > >>>>>> We do know that blktap2 only handles qcow, qcow2, raw and vhd (and the > >>>>>> implementation of the first two formats is too buggy to be used). > >>>>>> > >>>>>> Thus I think that the answer is pretty obvious here: we should try with > >>>>>> QEMU, whose supported format list is way wider. > >>>>>> > >>>>> Which I think we do, we only try and use blktap for raw and vhd. > >>>>> > >>>> As well as the disk format dimension (vhd, qcow2 etc) there''s also the > >>>> network disk access protocol (iSCSI, ceph/RBD, sheepdog?). Although both > >>>> blktap/tapdisk can handle raw, alas all the interesting (perhaps I''m > >>>> showing my bias here ;-) disk access protocols are in qemu. So if we > >>>> default to blktap/tapdisk we can only handle raw *files*, whereas if we > >>>> default to qemu we can also do these new fancy things as well. > >>>> > >>> Remind me why you can''t specify backend=qdisk explicitly? Does libvirt > >>> not propagate anything like that? > >>> > >> I''ll give this a go tomorrow when I''m not at the mercy of dodgy wifi :-) > >> From my reading of the libvirt code it does allow the selection of a > >> driverName from the set "tap" "tap2" "file" and "phy". > >> > > > > In libxl the choices are phy, tap and qdisk. There''s no file nor any > > distinction between tap and tap2. I expect the libxl libvirt driver is > > trying to make things easier for people migrating from xend (which is > > to be applauded). > > > > Right. ''file'' is mapped to backend ''tap'' and format ''raw''. In fact, I > should just post the code here for review :). For a bit of context, > l_disk is libvirt''s definition, and x_disk is libxl''s. Comments on this > mapping welcome, keeping in mind Ian''s point about migrating from xend. > > if (l_disk->driverName) { > if (STREQ(l_disk->driverName, "tap") || > STREQ(l_disk->driverName, "tap2")) { > switch (l_disk->format) { > case VIR_STORAGE_FILE_QCOW: > x_disk->format = LIBXL_DISK_FORMAT_QCOW; > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > break; > case VIR_STORAGE_FILE_QCOW2: > x_disk->format = LIBXL_DISK_FORMAT_QCOW2; > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > break; > case VIR_STORAGE_FILE_VHD: > x_disk->format = LIBXL_DISK_FORMAT_VHD; > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > break; > case VIR_STORAGE_FILE_NONE: > /* No subtype specified, default to raw/tap */ > case VIR_STORAGE_FILE_RAW: > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_TAP;[...]> >> It looks like we could define a "qdisk" driverName. > >> > > > > Yes, I think this would work. WRT to your comment below it looks like > > other libvirt drivers call this driver "qemu". > > > > IMO, we should just extend the above to map driverName ''qemu'' to backend > ''qdisk''. But what about the formats? I though qdisk could handle all > of them, particularly with qemu-upstream, even vmdk?We could certainly add new LIBXL_DISK_FORMAT_FOO to expose other formats, such as VMDK and the presumably these would map naturally from the VIR_STORAGE namespace. I think for qdisk you want a switch much like the above, but the backend always qdisk. I could also see an argument for handling the case of an unknown (or NIL) driver name by defaulting backend to UNKNOWN and setting the format as best you are able, like you do for tap (and soon qdisk ;-)). In addition are we at liberty to define driverName="xl" which explicitly tells us that source file == the full xl spec (which I suppose is really the xlu spec or something), using the libxlu_disk parser? Perhaps having LIBXL_DISK_FORMAT as an enum was a mistake and really it should have been a string with a mapping to backends in a configuration file, so it could be extensible without needing to recompile the library. This could be something to consider for a future libxl.> >> Actually this reminds me of a general concern I had: I worry that people > >> who choose to use libvirt will likely use all the default options, and > >> not customise their code too much for a specific hypervisor. After all, > >> if they wanted to customise their code for xen they would just use libxl > >> and use the richer interface. I think we ought to make sure as much > >> stuff just works by default as possible. In my ideal world it would be > >> possible to take a simple libvirt domain configuration (perhaps one > >> which used to run on kvm) and run it on xen, transparently benefiting > >> from xen features such as driver domains (and automatic blktap/qemu > >> selection?). I think if the domain XML requires too much tweaking then > >> people will just not use the features or may give up altogether. > >> > > > > This would be ideal, I seem to remember hearing that disk configuration > > was something which libvirt didn''t abstract, tending to just expose > > underlying configuration strings of the underlying toolstack. So this is > > one area where you actually have to change things when you migrate > > between hypervisors or they won''t work at all. > > > > For the most part, that is true,Glad to hear I''m not telling too blatant lies ;-) Ian.
On Fri, 26 Apr 2013, Jim Fehlig wrote:> if (l_disk->driverName) { > if (STREQ(l_disk->driverName, "tap") || > STREQ(l_disk->driverName, "tap2")) { > switch (l_disk->format) { > case VIR_STORAGE_FILE_QCOW: > x_disk->format = LIBXL_DISK_FORMAT_QCOW; > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > break; > case VIR_STORAGE_FILE_QCOW2: > x_disk->format = LIBXL_DISK_FORMAT_QCOW2; > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > break; > case VIR_STORAGE_FILE_VHD: > x_disk->format = LIBXL_DISK_FORMAT_VHD; > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > break; > case VIR_STORAGE_FILE_NONE: > /* No subtype specified, default to raw/tap */ > case VIR_STORAGE_FILE_RAW: > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > break; > default: > virReportError(VIR_ERR_INTERNAL_ERROR, > _("libxenlight does not support disk > driver %s"), > > virStorageFileFormatTypeToString(l_disk->format)); > return -1; > } > } else if (STREQ(l_disk->driverName, "file")) { > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > } else if (STREQ(l_disk->driverName, "phy")) { > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_PHY; > } else { > virReportError(VIR_ERR_INTERNAL_ERROR, > _("libxenlight does not support disk driver %s"), > l_disk->driverName); > return -1; > } > } else { > /* > * If driverName is not specified, default to raw as per > * xl-disk-configuration.txt in the xen documentation and let > * libxl pick a suitable backend. > */ > x_disk->format = LIBXL_DISK_FORMAT_RAW; > x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; > }It looks like the defaults are the same of libxl. However the mapping of RAW to TAP (libxl does the same) has always been a bit dubious to me: now that upstream QEMU is used with HVM guests too by libxl, there is no reason to use blktap over QEMU for raw files any more.
On Fri, 2013-04-26 at 11:10 +0100, Stefano Stabellini wrote:> On Fri, 26 Apr 2013, Jim Fehlig wrote: > > if (l_disk->driverName) { > > if (STREQ(l_disk->driverName, "tap") || > > STREQ(l_disk->driverName, "tap2")) { > > switch (l_disk->format) { > > case VIR_STORAGE_FILE_QCOW: > > x_disk->format = LIBXL_DISK_FORMAT_QCOW; > > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > > break; > > case VIR_STORAGE_FILE_QCOW2: > > x_disk->format = LIBXL_DISK_FORMAT_QCOW2; > > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > > break; > > case VIR_STORAGE_FILE_VHD: > > x_disk->format = LIBXL_DISK_FORMAT_VHD; > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > break; > > case VIR_STORAGE_FILE_NONE: > > /* No subtype specified, default to raw/tap */ > > case VIR_STORAGE_FILE_RAW: > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > break; > > default: > > virReportError(VIR_ERR_INTERNAL_ERROR, > > _("libxenlight does not support disk > > driver %s"), > > > > virStorageFileFormatTypeToString(l_disk->format)); > > return -1; > > } > > } else if (STREQ(l_disk->driverName, "file")) { > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > } else if (STREQ(l_disk->driverName, "phy")) { > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > x_disk->backend = LIBXL_DISK_BACKEND_PHY; > > } else { > > virReportError(VIR_ERR_INTERNAL_ERROR, > > _("libxenlight does not support disk driver %s"), > > l_disk->driverName); > > return -1; > > } > > } else { > > /* > > * If driverName is not specified, default to raw as per > > * xl-disk-configuration.txt in the xen documentation and let > > * libxl pick a suitable backend. > > */ > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; > > } > > It looks like the defaults are the same of libxl. > > However the mapping of RAW to TAP (libxl does the same) has always been > a bit dubious to me: now that upstream QEMU is used with HVM guests too > by libxl, there is no reason to use blktap over QEMU for raw files any > more.There are two TAP+RAW in the above, one is inside an if (driver="tap") which seems reasonable, are you talking about the one in the driver="file"? I think it would be better for libvirt in the file and phy cases to just say format=raw and leave libxl to pick a backend capable of providing this. Changing what libxl does is a separate question, but I think you are right that a case could be made for preferring qdisk in the default case. Ian.
On Fri, 26 Apr 2013, Ian Campbell wrote:> On Fri, 2013-04-26 at 11:10 +0100, Stefano Stabellini wrote: > > On Fri, 26 Apr 2013, Jim Fehlig wrote: > > > if (l_disk->driverName) { > > > if (STREQ(l_disk->driverName, "tap") || > > > STREQ(l_disk->driverName, "tap2")) { > > > switch (l_disk->format) { > > > case VIR_STORAGE_FILE_QCOW: > > > x_disk->format = LIBXL_DISK_FORMAT_QCOW; > > > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > > > break; > > > case VIR_STORAGE_FILE_QCOW2: > > > x_disk->format = LIBXL_DISK_FORMAT_QCOW2; > > > x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > > > break; > > > case VIR_STORAGE_FILE_VHD: > > > x_disk->format = LIBXL_DISK_FORMAT_VHD; > > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > > break; > > > case VIR_STORAGE_FILE_NONE: > > > /* No subtype specified, default to raw/tap */ > > > case VIR_STORAGE_FILE_RAW: > > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > > break; > > > default: > > > virReportError(VIR_ERR_INTERNAL_ERROR, > > > _("libxenlight does not support disk > > > driver %s"), > > > > > > virStorageFileFormatTypeToString(l_disk->format)); > > > return -1; > > > } > > > } else if (STREQ(l_disk->driverName, "file")) { > > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > > x_disk->backend = LIBXL_DISK_BACKEND_TAP; > > > } else if (STREQ(l_disk->driverName, "phy")) { > > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > > x_disk->backend = LIBXL_DISK_BACKEND_PHY; > > > } else { > > > virReportError(VIR_ERR_INTERNAL_ERROR, > > > _("libxenlight does not support disk driver %s"), > > > l_disk->driverName); > > > return -1; > > > } > > > } else { > > > /* > > > * If driverName is not specified, default to raw as per > > > * xl-disk-configuration.txt in the xen documentation and let > > > * libxl pick a suitable backend. > > > */ > > > x_disk->format = LIBXL_DISK_FORMAT_RAW; > > > x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; > > > } > > > > It looks like the defaults are the same of libxl. > > > > However the mapping of RAW to TAP (libxl does the same) has always been > > a bit dubious to me: now that upstream QEMU is used with HVM guests too > > by libxl, there is no reason to use blktap over QEMU for raw files any > > more. > > There are two TAP+RAW in the above, one is inside an if (driver="tap") > which seems reasonable, are you talking about the one in the > driver="file"?Yes> I think it would be better for libvirt in the file and phy cases to just > say format=raw and leave libxl to pick a backend capable of providing > this.I agree> Changing what libxl does is a separate question, but I think you are > right that a case could be made for preferring qdisk in the default > case.Indeed
On 26/04/13 05:50, Jim Fehlig wrote:> Jim Fehlig wrote: >> Ian Campbell wrote: >> >>> On Thu, 2013-04-25 at 14:22 +0100, Dave Scott wrote: >>> >>> >>>> On 25/04/13 13:43, Ian Campbell wrote: >>>> >>>> >>> >>> >>>> It looks like we could define a "qdisk" driverName. >>>> >>>> >>> Yes, I think this would work. WRT to your comment below it looks like >>> other libvirt drivers call this driver "qemu". >>> >>> >> >> IMO, we should just extend the above to map driverName ''qemu'' to backend >> ''qdisk''. But what about the formats? I though qdisk could handle all >> of them, particularly with qemu-upstream, even vmdk? >> > > Something like the attached, which seems to work well for me when > specifying driverName = qemu, e.g. > > <disk type=''file'' device=''disk''> > <driver name=''qemu''/> > <source file=''/var/lib/xen/images/sles11sp2-pv/disk0.raw''/> > <target dev=''xvda'' bus=''xen''/> > </disk>This also works for me! On a related note, what do you think about the attached patch? It allows the user to select a non-default qemu via the <emulator> element. My domain XML looked like this: <devices> <emulator>/usr/lib/xen/bin/qemu-system-i386</emulator> <disk device="disk" type="network"> <driver name=''qemu''/> <source protocol="rbd" name="rbd:rbd/ubuntu1204.img"/> <target dev="hda"/> </disk> <graphics type="vnc" port="-1" autoport="yes" listen="0.0.0.0"/> </devices> Now that upstream qemu is the default in xen-4.3, it seems useful to be able to select the traditional qemu for older VMs. Also I backported this to my xen-4.2 system and used this patch + your patch + the previous ''stat()'' fix to successfully start a VM on ceph storage via libvirt + libxl (my quest is almost complete!) Cheers, Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 26.04.2013 12:10, Stefano Stabellini wrote:> On Fri, 26 Apr 2013, Jim Fehlig wrote: >> if (l_disk->driverName) { >> if (STREQ(l_disk->driverName, "tap") || >> STREQ(l_disk->driverName, "tap2")) { >> switch (l_disk->format) { >> case VIR_STORAGE_FILE_QCOW: >> x_disk->format = LIBXL_DISK_FORMAT_QCOW; >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; >> break; >> case VIR_STORAGE_FILE_QCOW2: >> x_disk->format = LIBXL_DISK_FORMAT_QCOW2; >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; >> break; >> case VIR_STORAGE_FILE_VHD: >> x_disk->format = LIBXL_DISK_FORMAT_VHD; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> break; >> case VIR_STORAGE_FILE_NONE: >> /* No subtype specified, default to raw/tap */ >> case VIR_STORAGE_FILE_RAW: >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> break; >> default: >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("libxenlight does not support disk >> driver %s"), >> >> virStorageFileFormatTypeToString(l_disk->format)); >> return -1; >> } >> } else if (STREQ(l_disk->driverName, "file")) { >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> } else if (STREQ(l_disk->driverName, "phy")) { >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_PHY; >> } else { >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("libxenlight does not support disk driver %s"), >> l_disk->driverName); >> return -1; >> } >> } else { >> /* >> * If driverName is not specified, default to raw as per >> * xl-disk-configuration.txt in the xen documentation and let >> * libxl pick a suitable backend. >> */ >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; >> } > > It looks like the defaults are the same of libxl. > > However the mapping of RAW to TAP (libxl does the same) has always been > a bit dubious to me: now that upstream QEMU is used with HVM guests too > by libxl, there is no reason to use blktap over QEMU for raw files any > more.What about old good loop+phy based backend for file disk images? I don''t want whole qemu in dom0 for PV domains, only for handling simple disk backend. Additionally sparse images + loop + phy + mount -o discard in domU makes the images "auto shrinking". Don''t know if qemu is able to do this. Attached patch, which I currently use for that. If it is close to something that would be accepted, I will send it in new thread. BTW I don''t want qemu in dom0 for HVM also, but this is another story (modified stubdom etc) and probably not acceptable in vanilla xen. -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote:> What about old good loop+phy based backend for file disk images? I don''t want > whole qemu in dom0 for PV domains, only for handling simple disk backend. > Additionally sparse images + loop + phy + mount -o discard in domU makes the > images "auto shrinking". Don''t know if qemu is able to do this.IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and therefore your data isn''t actually on the disk when you might think it is, which can lead to filesystem corruption even if the f/s is doing correct barriers.> Attached patch, which I currently use for that. If it is close to something > that would be accepted, I will send it in new thread.I think you can use a block script for this (i.e. it does the loop mount) and avoid patching libxl at all. That''s what xend did at least... Ian.
On Fri, 26 Apr 2013, Marek Marczykowski wrote:> On 26.04.2013 12:10, Stefano Stabellini wrote: > > On Fri, 26 Apr 2013, Jim Fehlig wrote: > >> if (l_disk->driverName) { > >> if (STREQ(l_disk->driverName, "tap") || > >> STREQ(l_disk->driverName, "tap2")) { > >> switch (l_disk->format) { > >> case VIR_STORAGE_FILE_QCOW: > >> x_disk->format = LIBXL_DISK_FORMAT_QCOW; > >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > >> break; > >> case VIR_STORAGE_FILE_QCOW2: > >> x_disk->format = LIBXL_DISK_FORMAT_QCOW2; > >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; > >> break; > >> case VIR_STORAGE_FILE_VHD: > >> x_disk->format = LIBXL_DISK_FORMAT_VHD; > >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; > >> break; > >> case VIR_STORAGE_FILE_NONE: > >> /* No subtype specified, default to raw/tap */ > >> case VIR_STORAGE_FILE_RAW: > >> x_disk->format = LIBXL_DISK_FORMAT_RAW; > >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; > >> break; > >> default: > >> virReportError(VIR_ERR_INTERNAL_ERROR, > >> _("libxenlight does not support disk > >> driver %s"), > >> > >> virStorageFileFormatTypeToString(l_disk->format)); > >> return -1; > >> } > >> } else if (STREQ(l_disk->driverName, "file")) { > >> x_disk->format = LIBXL_DISK_FORMAT_RAW; > >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; > >> } else if (STREQ(l_disk->driverName, "phy")) { > >> x_disk->format = LIBXL_DISK_FORMAT_RAW; > >> x_disk->backend = LIBXL_DISK_BACKEND_PHY; > >> } else { > >> virReportError(VIR_ERR_INTERNAL_ERROR, > >> _("libxenlight does not support disk driver %s"), > >> l_disk->driverName); > >> return -1; > >> } > >> } else { > >> /* > >> * If driverName is not specified, default to raw as per > >> * xl-disk-configuration.txt in the xen documentation and let > >> * libxl pick a suitable backend. > >> */ > >> x_disk->format = LIBXL_DISK_FORMAT_RAW; > >> x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; > >> } > > > > It looks like the defaults are the same of libxl. > > > > However the mapping of RAW to TAP (libxl does the same) has always been > > a bit dubious to me: now that upstream QEMU is used with HVM guests too > > by libxl, there is no reason to use blktap over QEMU for raw files any > > more. > > What about old good loop+phy based backend for file disk images? I don''t want > whole qemu in dom0 for PV domains, only for handling simple disk backend. > Additionally sparse images + loop + phy + mount -o discard in domU makes the > images "auto shrinking". Don''t know if qemu is able to do this. > > Attached patch, which I currently use for that. If it is close to something > that would be accepted, I will send it in new thread. > > BTW I don''t want qemu in dom0 for HVM also, but this is another story > (modified stubdom etc) and probably not acceptable in vanilla xen.The original reason to drop loop+phy from libxl was that it is not safe as it doesn''t support O_DIRECT. It always goes through the Linux page cache. However after a lengthy discussion regarding QEMU and O_DIRECT with Alex, it turns out that we don''t actually need O_DIRECT for data consistency so loop+phy should be safe too after all. Time to bring back loop support?
On Fri, 26 Apr 2013, Ian Campbell wrote:> On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: > > What about old good loop+phy based backend for file disk images? I don''t want > > whole qemu in dom0 for PV domains, only for handling simple disk backend. > > Additionally sparse images + loop + phy + mount -o discard in domU makes the > > images "auto shrinking". Don''t know if qemu is able to do this. > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > therefore your data isn''t actually on the disk when you might think it > is, which can lead to filesystem corruption even if the f/s is doing > correct barriers.If it is safe for QEMU, it should be safe for loop and blkback too.> > Attached patch, which I currently use for that. If it is close to something > > that would be accepted, I will send it in new thread. > > I think you can use a block script for this (i.e. it does the loop > mount) and avoid patching libxl at all. That''s what xend did at least...Yes, libxl would need to arrage the script to be called when "phy" is used on a file, right?
On 26.04.2013 13:40, Ian Campbell wrote:> On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: >> What about old good loop+phy based backend for file disk images? I don''t want >> whole qemu in dom0 for PV domains, only for handling simple disk backend. >> Additionally sparse images + loop + phy + mount -o discard in domU makes the >> images "auto shrinking". Don''t know if qemu is able to do this. > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > therefore your data isn''t actually on the disk when you might think it > is, which can lead to filesystem corruption even if the f/s is doing > correct barriers. > >> Attached patch, which I currently use for that. If it is close to something >> that would be accepted, I will send it in new thread. > > I think you can use a block script for this (i.e. it does the loop > mount) and avoid patching libxl at all. That''s what xend did at least...This also was solution I''ve evaluated, but libvirt developers don''t want to hear about disk->script setting in official libvirt, so I still need to patch some library. Choosing libxl seems to be more universal. Setting loop in libxl is also much faster than calling script for that. Perhaps I can rewrite this patch to allow the choice (LIBXL_DISK_BACKEND_LOOP)? -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Fri, 2013-04-26 at 12:48 +0100, Stefano Stabellini wrote:> On Fri, 26 Apr 2013, Ian Campbell wrote: > > On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: > > > What about old good loop+phy based backend for file disk images? I don''t want > > > whole qemu in dom0 for PV domains, only for handling simple disk backend. > > > Additionally sparse images + loop + phy + mount -o discard in domU makes the > > > images "auto shrinking". Don''t know if qemu is able to do this. > > > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > > therefore your data isn''t actually on the disk when you might think it > > is, which can lead to filesystem corruption even if the f/s is doing > > correct barriers. > > If it is safe for QEMU, it should be safe for loop and blkback too.qemu (and blkback) will issue correct barriers/syncs to the underlying storage. AFAIK loop does not.> > > Attached patch, which I currently use for that. If it is close to something > > > that would be accepted, I will send it in new thread. > > > > I think you can use a block script for this (i.e. it does the loop > > mount) and avoid patching libxl at all. That''s what xend did at least... > > Yes, libxl would need to arrage the script to be called when "phy" is > used on a file, right?I meant the user can pass "script=block-loop". In the full knowledge of what that means for their data integrity. Ian.
On 26/04/13 12:10, Stefano Stabellini wrote:> On Fri, 26 Apr 2013, Jim Fehlig wrote: >> if (l_disk->driverName) { >> if (STREQ(l_disk->driverName, "tap") || >> STREQ(l_disk->driverName, "tap2")) { >> switch (l_disk->format) { >> case VIR_STORAGE_FILE_QCOW: >> x_disk->format = LIBXL_DISK_FORMAT_QCOW; >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; >> break; >> case VIR_STORAGE_FILE_QCOW2: >> x_disk->format = LIBXL_DISK_FORMAT_QCOW2; >> x_disk->backend = LIBXL_DISK_BACKEND_QDISK; >> break; >> case VIR_STORAGE_FILE_VHD: >> x_disk->format = LIBXL_DISK_FORMAT_VHD; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> break; >> case VIR_STORAGE_FILE_NONE: >> /* No subtype specified, default to raw/tap */ >> case VIR_STORAGE_FILE_RAW: >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> break; >> default: >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("libxenlight does not support disk >> driver %s"), >> >> virStorageFileFormatTypeToString(l_disk->format)); >> return -1; >> } >> } else if (STREQ(l_disk->driverName, "file")) { >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_TAP; >> } else if (STREQ(l_disk->driverName, "phy")) { >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_PHY; >> } else { >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("libxenlight does not support disk driver %s"), >> l_disk->driverName); >> return -1; >> } >> } else { >> /* >> * If driverName is not specified, default to raw as per >> * xl-disk-configuration.txt in the xen documentation and let >> * libxl pick a suitable backend. >> */ >> x_disk->format = LIBXL_DISK_FORMAT_RAW; >> x_disk->backend = LIBXL_DISK_BACKEND_UNKNOWN; >> } > > It looks like the defaults are the same of libxl. > > However the mapping of RAW to TAP (libxl does the same) has always been > a bit dubious to me: now that upstream QEMU is used with HVM guests too > by libxl, there is no reason to use blktap over QEMU for raw files any > more.Qemu should indeed be much faster than blktap when using recent Linux kernels in the DomU due to persistent grants, I don''t think blktap can provide the same performance, although I have not tested it. With Qemu we can get 900k IOPS. http://xenbits.xen.org/people/royger/persistent_read_qemu.png
On Fri, 26 Apr 2013, Ian Campbell wrote:> On Fri, 2013-04-26 at 12:48 +0100, Stefano Stabellini wrote: > > On Fri, 26 Apr 2013, Ian Campbell wrote: > > > On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: > > > > What about old good loop+phy based backend for file disk images? I don''t want > > > > whole qemu in dom0 for PV domains, only for handling simple disk backend. > > > > Additionally sparse images + loop + phy + mount -o discard in domU makes the > > > > images "auto shrinking". Don''t know if qemu is able to do this. > > > > > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > > > therefore your data isn''t actually on the disk when you might think it > > > is, which can lead to filesystem corruption even if the f/s is doing > > > correct barriers. > > > > If it is safe for QEMU, it should be safe for loop and blkback too. > > qemu (and blkback) will issue correct barriers/syncs to the underlying > storage. AFAIK loop does not.Looking at drivers/block/loop.c, it seems to me that REQ_FUA and REQ_FLUSH both handled correctly by issuing a vfs_fsync.> > > > Attached patch, which I currently use for that. If it is close to something > > > > that would be accepted, I will send it in new thread. > > > > > > I think you can use a block script for this (i.e. it does the loop > > > mount) and avoid patching libxl at all. That''s what xend did at least... > > > > Yes, libxl would need to arrage the script to be called when "phy" is > > used on a file, right? > > I meant the user can pass "script=block-loop". In the full knowledge of > what that means for their data integrity.Yeah, but there is no such block-loop script at the moment, the script is called block and its behaviour depends on the xenstore "type" node: it must be "file" and libxl never writes "type" "file" to xenstore (see device_disk_add and libxl__device_disk_string_of_backend).
David Scott wrote:>> Something like the attached, which seems to work well for me when >> specifying driverName = qemu, e.g. >> >> <disk type=''file'' device=''disk''> >> <driver name=''qemu''/> >> <source file=''/var/lib/xen/images/sles11sp2-pv/disk0.raw''/> >> <target dev=''xvda'' bus=''xen''/> >> </disk> > > This also works for me!Good to hear. I''ll send the patch to the libvirt list.> > On a related note, what do you think about the attached patch? It > allows the user to select a non-default qemu via the <emulator> > element. My domain XML looked like this: > > <devices> > <emulator>/usr/lib/xen/bin/qemu-system-i386</emulator>IMO, the possible emulators should be exposed in the capabilities. E.g. on a machine with both kvm and qemu, both emulators are shown as possibilities for an hvm, x86_64 guest # virsh capabilities ... <guest> <os_type>hvm</os_type> <arch name=''x86_64''> <wordsize>64</wordsize> <domain type=''qemu''> <emulator>/usr/bin/qemu-system-x86_64</emulator> <machine>pc-1.1</machine> <machine canonical=''pc-1.1''>pc</machine> <machine>pc-1.0</machine> <machine>pc-0.15</machine> <machine>pc-0.14</machine> <machine>pc-0.13</machine> <machine>pc-0.12</machine> <machine>pc-0.11</machine> <machine>pc-0.10</machine> <machine>isapc</machine> </domain> <domain type=''kvm''> <emulator>/usr/bin/qemu-kvm</emulator> <machine>pc-1.1</machine> <machine canonical=''pc-1.1''>pc</machine> <machine>pc-1.0</machine> <machine>pc-0.15</machine> <machine>pc-0.14</machine> <machine>pc-0.13</machine> <machine>pc-0.12</machine> <machine>pc-0.11</machine> <machine>pc-0.10</machine> <machine>isapc</machine> </domain> .... </guest> ...> <disk device="disk" type="network"> > <driver name=''qemu''/> > <source protocol="rbd" name="rbd:rbd/ubuntu1204.img"/> > <target dev="hda"/> > </disk> > <graphics type="vnc" port="-1" autoport="yes" listen="0.0.0.0"/> > </devices> > > Now that upstream qemu is the default in xen-4.3, it seems useful to > be able to select the traditional qemu for older VMs.Agreed. And reporting that both emulators exist via the capabilities would be helpful for users.> > Also I backported this to my xen-4.2 system and used this patch + your > patch + the previous ''stat()'' fix to successfully start a VM on ceph > storage via libvirt + libxl (my quest is almost complete!)Nice!> diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c > index f549a5d..abbd3c0 100644 > --- a/src/libxl/libxl_conf.c > +++ b/src/libxl/libxl_conf.c > @@ -811,6 +811,30 @@ libxlMakeCapabilities(libxl_ctx *ctx) > } > > int > +libxlMakeEmulator(virDomainDefPtr def, libxl_domain_config *d_config) > +{ > + /* No explicit override means use the default */ > + if (!def->emulator) { > + return 0; > + } > + if (strstr(def->emulator, "/qemu-system-")) { > + d_config->b_info.device_model_version > + LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; > + return 0; > + } >Here we could check the requested emulator against the capabilities, and then do the proper mapping for device_model_version. Do you have time for an upstream libvirt patch to expose the possible emulators in the capabilities, along with this patch allowing the user to specify one? Regards, Jim> + if (strstr(def->emulator, "/qemu-dm")) { > + d_config->b_info.device_model_version > + LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL; > + return 0; > + } > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("libxenlight doesn''t support emulator ''%s''"), > + def->emulator); > + return -1; > +} > + > + > +int > libxlBuildDomainConfig(libxlDriverPrivatePtr driver, > virDomainDefPtr def, libxl_domain_config *d_config) > { > @@ -834,6 +858,10 @@ libxlBuildDomainConfig(libxlDriverPrivatePtr driver, > goto error; > } > > + if (libxlMakeEmulator(def, d_config) < 0) { > + goto error; > + } > + > d_config->on_reboot = def->onReboot; > d_config->on_poweroff = def->onPoweroff; > d_config->on_crash = def->onCrash; > > ------------------------------------------------------------------------
On Fri, 2013-04-26 at 18:10 +0100, Stefano Stabellini wrote:> On Fri, 26 Apr 2013, Ian Campbell wrote: > > On Fri, 2013-04-26 at 12:48 +0100, Stefano Stabellini wrote: > > > On Fri, 26 Apr 2013, Ian Campbell wrote: > > > > On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: > > > > > What about old good loop+phy based backend for file disk images? I don''t want > > > > > whole qemu in dom0 for PV domains, only for handling simple disk backend. > > > > > Additionally sparse images + loop + phy + mount -o discard in domU makes the > > > > > images "auto shrinking". Don''t know if qemu is able to do this. > > > > > > > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > > > > therefore your data isn''t actually on the disk when you might think it > > > > is, which can lead to filesystem corruption even if the f/s is doing > > > > correct barriers. > > > > > > If it is safe for QEMU, it should be safe for loop and blkback too. > > > > qemu (and blkback) will issue correct barriers/syncs to the underlying > > storage. AFAIK loop does not. > > Looking at drivers/block/loop.c, it seems to me that REQ_FUA and > REQ_FLUSH both handled correctly by issuing a vfs_fsync.Oh, good, either it''s been fixed (do we know when?) or I was confusing things with O_DIRECT.> > > > > Attached patch, which I currently use for that. If it is close to something > > > > > that would be accepted, I will send it in new thread. > > > > > > > > I think you can use a block script for this (i.e. it does the loop > > > > mount) and avoid patching libxl at all. That''s what xend did at least... > > > > > > Yes, libxl would need to arrage the script to be called when "phy" is > > > used on a file, right? > > > > I meant the user can pass "script=block-loop". In the full knowledge of > > what that means for their data integrity. > > Yeah, but there is no such block-loop script at the moment, the script > is called block and its behaviour depends on the xenstore "type" node: > it must be "file" and libxl never writes "type" "file" to xenstore (see > device_disk_add and libxl__device_disk_string_of_backend).A block-loop script would be a matter of a few minutes work based on the current block script, or the current block script could be modified to stat the path and DTRT when handling a phy type device. Ian.
Hi Jim, Thanks for the explanation about the capabilities. On 27/04/13 00:44, Jim Fehlig wrote:> Do you have time for an upstream libvirt patch to expose the possible > emulators in the capabilities, along with this patch allowing the user > to specify one?I''ll have a go and send you (cc''ing the libvirt list) what I manage to come up with. Cheers, Dave
On Mon, 29 Apr 2013, Ian Campbell wrote:> On Fri, 2013-04-26 at 18:10 +0100, Stefano Stabellini wrote: > > On Fri, 26 Apr 2013, Ian Campbell wrote: > > > On Fri, 2013-04-26 at 12:48 +0100, Stefano Stabellini wrote: > > > > On Fri, 26 Apr 2013, Ian Campbell wrote: > > > > > On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: > > > > > > What about old good loop+phy based backend for file disk images? I don''t want > > > > > > whole qemu in dom0 for PV domains, only for handling simple disk backend. > > > > > > Additionally sparse images + loop + phy + mount -o discard in domU makes the > > > > > > images "auto shrinking". Don''t know if qemu is able to do this. > > > > > > > > > > IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and > > > > > therefore your data isn''t actually on the disk when you might think it > > > > > is, which can lead to filesystem corruption even if the f/s is doing > > > > > correct barriers. > > > > > > > > If it is safe for QEMU, it should be safe for loop and blkback too. > > > > > > qemu (and blkback) will issue correct barriers/syncs to the underlying > > > storage. AFAIK loop does not. > > > > Looking at drivers/block/loop.c, it seems to me that REQ_FUA and > > REQ_FLUSH both handled correctly by issuing a vfs_fsync. > > Oh, good, either it''s been fixed (do we know when?) or I was confusing > things with O_DIRECT. > > > > > > > Attached patch, which I currently use for that. If it is close to something > > > > > > that would be accepted, I will send it in new thread. > > > > > > > > > > I think you can use a block script for this (i.e. it does the loop > > > > > mount) and avoid patching libxl at all. That''s what xend did at least... > > > > > > > > Yes, libxl would need to arrage the script to be called when "phy" is > > > > used on a file, right? > > > > > > I meant the user can pass "script=block-loop". In the full knowledge of > > > what that means for their data integrity. > > > > Yeah, but there is no such block-loop script at the moment, the script > > is called block and its behaviour depends on the xenstore "type" node: > > it must be "file" and libxl never writes "type" "file" to xenstore (see > > device_disk_add and libxl__device_disk_string_of_backend). > > A block-loop script would be a matter of a few minutes work based on the > current block script, or the current block script could be modified to > stat the path and DTRT when handling a phy type device.I vote for having this in 4.3
Marek Marczykowski wrote:> On 26.04.2013 13:40, Ian Campbell wrote: > >> On Fri, 2013-04-26 at 12:31 +0100, Marek Marczykowski wrote: >> >>> What about old good loop+phy based backend for file disk images? I don''t want >>> whole qemu in dom0 for PV domains, only for handling simple disk backend. >>> Additionally sparse images + loop + phy + mount -o discard in domU makes the >>> images "auto shrinking". Don''t know if qemu is able to do this. >>> >> IIRC the problem with loop+phy is that loop doesn''t do O_DIRECT and >> therefore your data isn''t actually on the disk when you might think it >> is, which can lead to filesystem corruption even if the f/s is doing >> correct barriers. >> >> >>> Attached patch, which I currently use for that. If it is close to something >>> that would be accepted, I will send it in new thread. >>> >> I think you can use a block script for this (i.e. it does the loop >> mount) and avoid patching libxl at all. That''s what xend did at least... >> > > This also was solution I''ve evaluated, but libvirt developers don''t want to > hear about disk->script setting in official libvirt, so I still need to patch > some library.Good old loop+phy is supported by the legacy libvirt xen driver without libvirt needing <disk><script /></disk>, generally by using <driver name=''file'' />.> Choosing libxl seems to be more universal. Setting loop in libxl > is also much faster than calling script for that. > Perhaps I can rewrite this patch to allow the choice (LIBXL_DISK_BACKEND_LOOP)? >With such a patch, libvirt could set the backend to LIBXL_DISK_BACKEND_LOOP when <driver name=''file''>, preserving loop+phy backend setup done by the old toolstack. Regards, Jim