Dutch Meyer
2008-Oct-31 07:44 UTC
[Xen-devel] [4 Patches] New blktap implementation, 2nd try
The list filters didn''t like the first 2 patches, so I''m reposting the whole set from my site. I''ve also removed the hastily written and apparently misleading "less open source" line from the explanation below, so that people will stop teasing me. Rest assured, these patches are 100% pure open source. http://www.cs.ubc.ca/~dmeyer/blktapPatches/deprecate-blktap http://www.cs.ubc.ca/~dmeyer/blktapPatches/restore-blktap http://www.cs.ubc.ca/~dmeyer/blktapPatches/fixqcow http://www.cs.ubc.ca/~dmeyer/blktapPatches/blktap2module Signed-off-by: Jake Wires <jake.wires@citrix.com>, Dutch Meyer <dmeyer@cs.ubc.ca> This is a new and rewritten version of blktap that we have developed at Citrix. The current version of blktap is left functionally unmodified. The change set consists of four patches. 1) deprecate-blktap. A patch to deprecate the open source blktap, by moving it and issuing a warning whenever it is used. No functionality is modified in this patch, it is just housekeeping. 2) restore-blktap. A patch to add a new blktap implementation that is feature equivalent to (or better than) the current blktap. This will eventually replace the current blktap implementation. 3) fixqcow. Fix several bugs in the qcow tools. 4) blktap2module. A kernel patch to add a new unified blktap2 module that will eventually replace blktap. The new blktap implementation has several improvements. * Isolation from xenstore - Blktap devices can now be created in dom0 as virtual block devices without coordination from xen and have few dependencies on xenstore in normal operation. * Improved development environment for tapdisks, simpler request forwarding, new request scheduler. * Pause scripts updated to support live qcow snapshot (see xmsnap script) * New tapdisk type: Block Mason disks allow a set of tapdisks to be flexibly arranged into graph structure and modified on-the-fly. Several example modules for Block Mason are included. Block Mason disks are constructed and modified with a declarative configuration language. These capabilities are discussed in more depth in an upcoming paper in the First Workshop on IO Virtualization, available HERE: http://www.cs.ubc.ca/~dmeyer/blockmason-wiov-final.pdf --Dutch _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Oct-31 10:14 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Dutch Meyer schrieb:> 1) deprecate-blktap. A patch to deprecate the open source blktap, by > moving it and issuing a warning whenever it is used. No functionality > is modified in this patch, it is just housekeeping.Why would you want to keep the old version around if the new one is so much better?> 2) restore-blktap. A patch to add a new blktap implementation that is > feature equivalent to (or better than) the current blktap. This will > eventually replace the current blktap implementation.Now this is a huge patch, nearly impossible to review. At a first glance I noticed that it shares large parts with the old implementation. So while I believe you that the first patch is only moving things around, I really can''t tell what this patch is actually doing. It would be really useful to have the real diff, i.e. keep the old files in patch 1 and modify them here. Best you also split this patch into a patch for the core and one patch for the changes to each driver. I didn''t really look into the patch but merely scrolled through it and looked at the diffstat, so maybe I''m wrong, but it seems to lack implementations for tap:qcow2, tap:vmdk and tap:ioemu. Why that? Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dutch Meyer
2008-Oct-31 17:51 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
> Why would you want to keep the old version around if the new one is so > much better?We''re very open to removing the old version, but there is a backwards compatability concern. See below.> Now this is a huge patch, nearly impossible to review. At a first glance > I noticed that it shares large parts with the old implementation. So > while I believe you that the first patch is only moving things around, I > really can''t tell what this patch is actually doing.I think relative to the changes, there isn''t that much that is shared with the old implementation. Taking a diff against the two directories will show that the core of blktap has been redone, much of it from scratch. That''s why we did it this way. You can diff versus the two blktap directories to see this. If you still think this would be very helpful, I can provide this diff. The drivers are, at their core, unchanged so perhaps separate patches would make sense in that case.> I didn''t really look into the patch but merely scrolled through it and > looked at the diffstat, so maybe I''m wrong, but it seems to lack > implementations for tap:qcow2, tap:vmdk and tap:ioemu. Why that?You''re right about that - we decided to post this without qcow2, but can port that one as well. VMDK is slow, it''s written to be purely synchronous, and we don''t think anyone actually uses it. ioemu is a different matter - This is a very difficult piece of code to work with, because large portions of blktap have been replicated to make it work. On top of that, the wire protocol changed to support this. Porting this driver would be quite difficult. This ioemu code is the primary reason we provided backwards compatability with what is now "blktap_old". I really appreciate the conversation here, and look forward to keeping it going, but i may be away from email for the weekend, so please excuse any slowness in my reply. --Dutch _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Nov-03 10:58 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Dutch Meyer schrieb:>> Why would you want to keep the old version around if the new one is so >> much better? > > We''re very open to removing the old version, but there is a backwards > compatability concern. See below.Right, I was afraid this would be your answer. I''m certainly not opposed to improve blktap by a rewrite, but I do have concerns that you''re not going the right way with this series - not implementation-wise, but from a design perspective. You''re saying that removing tap:ioemu is a compatibility issue. That''s not quite the point. The user won''t lose functionality (in the current upstream implementation of tap:ioemu), he could use tap:aio or tap:qcow2 or whatever instead. The big evil about what we currently have is code duplication. For each image format we support we have two implementations: One in ioemu and one in tapdisk. This is why it was agreed that we want to get rid of the tapdisk implementations and move everything into ioemu (the other way round wouldn''t work because of HVM). Now instead of doing the final step and making tap:ioemu the default so that only one implementation is left, you''re trying to add even the third implementation. This is clearly not what you want from a design and maintainability perspective. So what I''m saying is that while I''m not opposed to a rewrite in principle, the rewrite needs to be a complete drop-in replacement to avoid this third copy of the code. Ideally the rewrite would be completely integrated into qemu, but at least not having a third copy and making things even worse is a must, IMHO. One more thought, even if it gets lengthy now... If you really want to stay with the two implementations, what about at least changing the driver''s interface to be more similar to qemu''s? This certainly would help with merging upstream changes. Ideally you would take qemu''s code, throw it into the blktap directory and you''re done.>> Now this is a huge patch, nearly impossible to review. At a first glance >> I noticed that it shares large parts with the old implementation. So >> while I believe you that the first patch is only moving things around, I >> really can''t tell what this patch is actually doing. > > I think relative to the changes, there isn''t that much that is shared > with the old implementation. Taking a diff against the two directories > will show that the core of blktap has been redone, much of it from > scratch. That''s why we did it this way. > > You can diff versus the two blktap directories to see this. If you > still think this would be very helpful, I can provide this diff. The > drivers are, at their core, unchanged so perhaps separate patches would > make sense in that case.Separating the drivers out could help a lot, I think. Would be great if you could do this.>> I didn''t really look into the patch but merely scrolled through it and >> looked at the diffstat, so maybe I''m wrong, but it seems to lack >> implementations for tap:qcow2, tap:vmdk and tap:ioemu. Why that? > > You''re right about that - we decided to post this without qcow2, but can > port that one as well. VMDK is slow, it''s written to be purely > synchronous, and we don''t think anyone actually uses it. > > ioemu is a different matter - This is a very difficult piece of code to > work with, because large portions of blktap have been replicated to make > it work. On top of that, the wire protocol changed to support this. > Porting this driver would be quite difficult. This ioemu code is the > primary reason we provided backwards compatability with what is now > "blktap_old".I''ve already explained above why from a design perspective I don''t think ioemu is a compatibility thing, but actually the right next step. For the implementation, I don''t think tap:ioemu is a particularly difficult piece of code. In ioemu, it shares a few lines with tapdisk.c which is bad, but tapdisk was meant to disappear anyway when tap:ioemu is completed. I guess we can easily replace them with your new code, that should be no problem. (Haven''t looked yet at the new code, though) The other thing that was changed is blktapctrl which simply opens a different pipe when it uses ioemu instead of tapdisk. The wire protocol has not been changed to achieve this (why do you think so?). This shouldn''t be too hard to implement either.> I really appreciate the conversation here, and look forward to keeping > it going, but i may be away from email for the weekend, so please excuse > any slowness in my reply.I was having a weekend myself, so that''s no problem. ;-) If you agree that the points I mentioned are issues and they need to be fixed, I''m willing to help you with it as time permits. I know that my suggestions actually mean quite a bit of reworking your patches, but I think it may be worth it. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Nov-03 18:01 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Hi Andrew, Andrew Warfield schrieb:> Hi Kevin, > > Thanks very much for your thoughts on this so far. > >> The big evil about what we currently have is code duplication. For each >> image format we support we have two implementations: One in ioemu and >> one in tapdisk. This is why it was agreed that we want to get rid of the >> tapdisk implementations and move everything into ioemu (the other way >> round wouldn''t work because of HVM). > > I didn''t realize that this had been universally agreed. Why do you > think that implementing virtual block devices inside qemu is the right > way to go -- it seems to me like a direction that''s just going to > bloat qemu and make it more difficult to maintain. > > The patches that Dutch has posted do a great thing from a design > perspective: they remove almost all dependencies on Xen from blktap.Don''t get me wrong: I''m not saying that the design is complete crap. Actually I can''t tell because I haven''t looked at it yet. It''s just one point, however an important one, I don''t agree with. The reason why I''d like to see the block drivers inside qemu is simple: They are already there and we need them there at least for HVM machines (and upstream qemu needs them always, so they certainly won''t go away). This patch series is creating a third copy of these original implementations and probably it''s going to modify them in yet another way. I think we don''t need to discuss that code duplication is bad, especially if you have changes in each copy which make merging hard. This doesn''t mean that you need to implement your stuff in qemu. I think I get your point, it wouldn''t really fit in the design. Of course it would be nice to have only one copy (the qemu one), but if you really need a second copy, okay. That still doesn''t explain why to change them beyond recognition instead of making them really more or less a literal copy or even compiling them out of the qemu source.> There is no longer a blktapctrl that watches Xenstore, and the blktap > kernel driver no longer duplicates functionality with blkback. > Instead, the new code lets you instantiate a tapdisk on the linux > command line, which results in the creation of a new block device. > Requests to this block device are forwarded through tapdisk. This > means that virtual block device code can be implemented in exactly one > place (tapdisk), and that consumers (like blkback and qemu) can just > use raw read/write requests to talk to that virtual device. > > The new structure is similar to FUSE, but at the block device > layer. > > The ioemu patches take a bunch of the ugliest parts of the old > blktap implementation -- especially the parts that talk over the > blktap-internal character device for user/kernel request forwarding -- > and duplicate them in qemu. I''d argue that this is taking things in > exactly the wrong direction, as it forces a bunch of internal xen-gore > into qemu, and means that the blktap linux driver and qemu share an > interface that must be maintained over time. > > The new blktap code has a pile of cool stuff: it does smart > request dependency tracking which is incredibly helpful when > implementing chained image formats effeciently. It has a request > merging layer that eliminates an enormous amount of overhead. > Finally, it has a quite neat interface for specifying block devices as > connected graphs of block components. > > The only remaining Xen code on the data path is an optimization to > preserve zero-copy request forwarding on Xen, and this won''t matter in > qemu environments. > > Having isolated tapdisks, and presenting the associated images as > Linux block devices means that (a) you can use tools like ionice to > prioritize individual block devices rather than having to set priority > for *all* of qemu, (b) individual tapdisks can serve block devices for > multiple VMs -- this is useful if you want to implement a cache for > many VMs booting from a common image, it''s also good for complicated > distributed block devices like Parallax (which Dutch can tell you more > about if you are interested) -- again, this is a case where having the > code in qemu is bad, you want multiple VMs to share the tapdisk-based > implementation. Finally, (c) tapdisk can be used to directly loopback > the image into linux, which allows people to more easily work with the > image''s contents. Again, here I don''t think you want a whole qemu. > > Sorry for the long email. The whole point behind blktap in the > first place was to make it easy for people to do fun things on the > block interface. I don''t don''t want to see repeated work on this > front any more than you do -- but I don''t think that hoovering the > code into qemu solves any problems.Thank you for this explanation, I see now what you''re aiming at. I think at least some parts of it should have gone into the patch comment in the first place. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Nov-03 18:48 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Andrew Warfield schrieb:> The current ioemu: extensions duplicate the blktap control and data > path code and suck them into qemu. I think this is probably about as > bad as duplicating drivers, given that it makes it quite difficult to > evolve the block extension interface.It''s difficult anyway. Probably that''s the reason why you''re proposing a revolution instead of evolution. ;-) But agreed, duplicating this code is as bad as duplicating drivers, still it is less code duplicated (and only duplicated, not completely changed). I did this only because tapdisk was meant to disappear eventually.> The patch doesn''t create three versions of drivers -- the third > version that you refer to is just the deprecated blktap code that''s > being replaced. So how about this: we drop in the new code, and > delete the old version of blktap altogether. Then let''s have a think > about what the right way to unify the drivers is.I think that''s one of the points I made in my long mail. At least we should replace the old blktap completely so that the duplication won''t become even worse.> I like your idea of being able to drop driver implementations into > both tapdisk and qemu. However, I think that when you look at the > code you''ll find that the new tapdisk interface provides a lot more > than what the qemu interfaces currently provide. A good solution to > this would be to just run tapdisk as the block virtualization layer > even when qemu is being used without Xen.Ask the upstream qemu mailing list. I''m quite sure they''ll tell you that it won''t happen. One reason is that blktap is limited to Linux and qemu isn''t. And what''s the interface you mean? blktap''s struct tap_disk certainly looks smaller than qemu''s struct BlockDriver. But even if tapdisk has a lot more than qemu provides, I think we would still win much if we can share at least the code for common functionality. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2008-Nov-04 09:43 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Kevin Wolf wrote:> So what I''m saying is that while I''m not opposed to a rewrite in > principle, the rewrite needs to be a complete drop-in replacement to > avoid this third copy of the code. Ideally the rewrite would be > completely integrated into qemu, but at least not having a third copy > and making things even worse is a must, IMHO.Oh, btw: qemu itself will get a xen block backend implementation soon anyway. Patch queue against qemu svn (upstream) are here: http://kraxel.fedorapeople.org/patches/qemu-upstream/ Patch queue for the qemu-xen git tree are here: http://kraxel.fedorapeople.org/patches/qemu-xen/ The patch adding the block backend is this one: http://kraxel.fedorapeople.org/patches/qemu-xen/0007-xen-add-block-device-backend-driver.patch You might also look at this one (common xenbus state machine, ...): http://kraxel.fedorapeople.org/patches/qemu-xen/0003-xen-backend-driver-core.patch Merging those patch sets into both qemu trees will start when Ian Jackson (qemu-xen maintainer) is back. Note that a special kernel driver for blktap isn''t needed at all. You can simply use the generic grant table and event channel device drivers. Which is exactly what the qemu backend implementation does. IMHO the blktap kernel driver is there only for historical reasons (it predates gntdev) and it should go away long-term. The qemu block layer has some problems performance-wise, so I can see your reasons to not use qemu. And the qemu backend will most likely not (yet) match blktap performance-wise. Nevertheless I think time is better spent fixing these problems in upstream qemu instead of forking off the qemu block layer code for the tapdisk daemon. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2008-Nov-04 09:50 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Kevin Wolf wrote:>> Having isolated tapdisks, and presenting the associated images as >> Linux block devices means that (a) you can use tools like ionice to >> prioritize individual block devices rather than having to set priority >> for *all* of qemu, (b) individual tapdisks can serve block devices for >> multiple VMs -- this is useful if you want to implement a cache for >> many VMs booting from a common image, it''s also good for complicated >> distributed block devices like Parallax (which Dutch can tell you more >> about if you are interested) -- again, this is a case where having the >> code in qemu is bad, you want multiple VMs to share the tapdisk-based >> implementation. Finally, (c) tapdisk can be used to directly loopback >> the image into linux, which allows people to more easily work with the >> image''s contents. Again, here I don''t think you want a whole qemu.upstream qemu recently got qemu-nbd, which can do the very same thing using the linux nbd driver. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Nov-04 10:05 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Gerd Hoffmann schrieb:> Kevin Wolf wrote: >> So what I''m saying is that while I''m not opposed to a rewrite in >> principle, the rewrite needs to be a complete drop-in replacement to >> avoid this third copy of the code. Ideally the rewrite would be >> completely integrated into qemu, but at least not having a third copy >> and making things even worse is a must, IMHO. > > Oh, btw: qemu itself will get a xen block backend implementation soon > anyway. > > [...] > > Note that a special kernel driver for blktap isn''t needed at all. You > can simply use the generic grant table and event channel device drivers. > Which is exactly what the qemu backend implementation does. IMHO the > blktap kernel driver is there only for historical reasons (it predates > gntdev) and it should go away long-term.Sorry, Gerd, I should have copied you from the beginning. I agree that integrating the backend into qemu is the right thing. You don''t want to have numerous tools running. It''s not a complete solution, though. A nice thing about blktap is that you can attach a qcow2 image to Dom0, e.g. to copy the kernel out of the image. I think this is the point where your approach and the new blktap (which according to Andraw in fact isn''t blktap anymore) are complementary. blkfront talks to qemu which uses its drivers to access the image. Alternatively (maybe for the more complicated stuff blktap seems to provide) it can use the now Xen agnostic blktap to access the blktap device nodes through its raw block driver. For accessing images in Dom0, blktap without qemu is used. And of course, the blktap drivers are compiled out of qemu source if qemu has the respective driver. Does that sound reasonable?> The qemu block layer has some problems performance-wise, so I can see > your reasons to not use qemu. And the qemu backend will most likely not > (yet) match blktap performance-wise. Nevertheless I think time is > better spent fixing these problems in upstream qemu instead of forking > off the qemu block layer code for the tapdisk daemon.qemu didn''t perform that bad in my tests. blkbk is faster, of course, but it''s not by orders of magnitude. But you''re right in that Xen should finally learn to work with upstream instead of forking everything (and Ian''s merging work was a great start - but still just a start). Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Nov-04 10:13 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
On 4/11/08 10:05, "Kevin Wolf" <kwolf@suse.de> wrote:>> Note that a special kernel driver for blktap isn''t needed at all. You >> can simply use the generic grant table and event channel device drivers. >> Which is exactly what the qemu backend implementation does. IMHO the >> blktap kernel driver is there only for historical reasons (it predates >> gntdev) and it should go away long-term. > > Sorry, Gerd, I should have copied you from the beginning. > > I agree that integrating the backend into qemu is the right thing. You > don''t want to have numerous tools running. It''s not a complete solution, > though. A nice thing about blktap is that you can attach a qcow2 image > to Dom0, e.g. to copy the kernel out of the image. > > I think this is the point where your approach and the new blktap (which > according to Andraw in fact isn''t blktap anymore) are complementary. > blkfront talks to qemu which uses its drivers to access the image. > Alternatively (maybe for the more complicated stuff blktap seems to > provide) it can use the now Xen agnostic blktap to access the blktap > device nodes through its raw block driver. For accessing images in Dom0, > blktap without qemu is used. And of course, the blktap drivers are > compiled out of qemu source if qemu has the respective driver. > > Does that sound reasonable?Yes, it does seem to me that blktap2 world and qemu-tap world can coexist quite happily. As long as both camps have supporters and are maintained, what''s the problem? I don''t think these email arguments are changing anyone''s entrenched positions. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2008-Nov-04 10:20 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Hi,> I agree that integrating the backend into qemu is the right thing. You > don''t want to have numerous tools running. It''s not a complete solution, > though. A nice thing about blktap is that you can attach a qcow2 image > to Dom0, e.g. to copy the kernel out of the image.There is qemu-nbd ...> I think this is the point where your approach and the new blktap (which > according to Andraw in fact isn''t blktap anymore) are complementary. > blkfront talks to qemu which uses its drivers to access the image.I''ve read the thread down meanwhile, although I''ve seen the text describing the blktap design only quoted in your reply, looks like Andrew''s mail didn''t made it to the list for some reason. It is very different from old blktap indeed.> Alternatively (maybe for the more complicated stuff blktap seems to > provide) it can use the now Xen agnostic blktap to access the blktap > device nodes through its raw block driver.Would work, yes. But what is the point? qemu will certainly not rip out the disk format drivers, so there is zero need to go that route.> For accessing images in Dom0, > blktap without qemu is used. And of course, the blktap drivers are > compiled out of qemu source if qemu has the respective driver.See qemu-nbd. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2008-Nov-04 10:34 UTC
Re: [Xen-devel] [4 Patches] New blktap implementation, 2nd try
Gerd Hoffmann schrieb:>> I agree that integrating the backend into qemu is the right thing. You >> don''t want to have numerous tools running. It''s not a complete solution, >> though. A nice thing about blktap is that you can attach a qcow2 image >> to Dom0, e.g. to copy the kernel out of the image. > > There is qemu-nbd ...Right, I forgot about that one.>> Alternatively (maybe for the more complicated stuff blktap seems to >> provide) it can use the now Xen agnostic blktap to access the blktap >> device nodes through its raw block driver. > > Would work, yes. But what is the point? qemu will certainly not rip > out the disk format drivers, so there is zero need to go that route.Well, at the moment I for my part don''t feel the need to go that route. Neither do you. But obviously Andrew and Dutch do. I don''t know, maybe blktap2 really has some nice features that someone might want to use. And if they maintain blktap2 - why not? If we make the separation as I proposed (with the option of qemu-nbd instead of blktap2 for attaching to Dom0), it doesn''t hurt anyone after all - you can use it then, but you''re not forced. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel