Daniel P. Berrange
2007-Jul-19 17:09 UTC
[Xen-devel] PATCH: Enable QEMU booting of blktap disks
This is a re-send of previous patches: http://lists.xensource.com/archives/html/xen-devel/2007-06/msg01021.html The only change is that it explicitly looks for the driver type in xenstore rather than assuming ''xvd'' == ''tap'' - this is because tap could be configured with ''hd'' or ''sd'' nodenames too, and we still need to strip the leading '':aio'' or '':vmdk'', etc prefix from the path. There are two patches: - xen-revert-phantom-2.patch removes the phantom device code since it doesn''t work & is redundant if QEMU can process tap devices straight from xenstore - xen-qemu-blktap-2.patch makes QEMU able to handle disks with xvd prefix treating them as IDE. Also makes QEMU strip the driver type prefix from tap disks since it can auto-guess driver Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2007-Jul-19 17:34 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
So two comments on this: In the other thread that''s currently going on this topic, it sounds like others are quite successfully using the phantom code. Why is it broken for you? As I''ve said before, I dislike the idea of having separate implementations of disks -- one in qemu and one in tapdisk. We''d quite like to encourage people to be able to extend virtual block devices in the future, and it seems like your approach is going to force them to do two independent implementations of things. It also leads to complications if you want to add things like caching, shared ramdisks, etc. If phantom is broken, why don''t we just fix that? a. On 7/19/07, Daniel P. Berrange <berrange@redhat.com> wrote:> This is a re-send of previous patches: > > http://lists.xensource.com/archives/html/xen-devel/2007-06/msg01021.html > > The only change is that it explicitly looks for the driver type in xenstore > rather than assuming ''xvd'' == ''tap'' - this is because tap could be configured > with ''hd'' or ''sd'' nodenames too, and we still need to strip the leading > '':aio'' or '':vmdk'', etc prefix from the path. > > There are two patches: > > - xen-revert-phantom-2.patch removes the phantom device code since it > doesn''t work & is redundant if QEMU can process tap devices straight > from xenstore > > - xen-qemu-blktap-2.patch makes QEMU able to handle disks with xvd prefix > treating them as IDE. Also makes QEMU strip the driver type prefix from > tap disks since it can auto-guess driver > > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > > Dan. > -- > |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| > |=- Perl modules: http://search.cpan.org/~danberr/ -=| > |=- Projects: http://freshmeat.net/~danielpb/ -=| > |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2007-Jul-19 18:08 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
On Thu, Jul 19, 2007 at 10:34:12AM -0700, Andrew Warfield wrote:> So two comments on this: > > In the other thread that''s currently going on this topic, it sounds > like others are quite successfully using the phantom code. Why is it > broken for you?I really can''t see how it works for anybody in 3.1.0 since the code which sets up phantom devices simply doesn''t work try: imagetype = self.vm.info[''image''][''type''] except: imagetype = "" if imagetype == ''hvm'': The body of that try: statement is trying to read hash keys which don''t exist, since ''vm.info'' isn''t a hash. So imagetype is always "" and so none of the phantom setup code ever gets run. Even once fixing that I never get any devices appearing and the Vm just immediately shuts down. It seems to be looking for the /dev/xvd* device nodes in Dom0 rather than DomU which seems rather wrong.> As I''ve said before, I dislike the idea of having separate > implementations of disks -- one in qemu and one in tapdisk. We''d > quite like to encourage people to be able to extend virtual block > devices in the future, and it seems like your approach is going to > force them to do two independent implementations of things. It also > leads to complications if you want to add things like caching, shared > ramdisks, etc. If phantom is broken, why don''t we just fix that?AFAICT with or without my change you need to have two separate impls of every disk format, since the phantom device stuff is only ever used by blktap - non blktap disks still get processed directly by QEMU. Now if we intend to remove all support for file: entirely, and make blktap compulsory for file backed VMs then I can see the benefit in having everything go via one codepath. Though now having 2 userspace daemons in Dom0 per HVM guest seems like its going in wrong direction to me. IMHO the entire design & impl of blktap userspace was broken from the start because it is duplicating functionality already in the QEMU codebase. With the benefit of hindsight, I would suggest that it would be better to have QEMU able to speak the native blktap protocol straight to the blktap kernel driver. Keep HVM using QEMU for all file backed disks, since it already handles all the formats just fine, and have a new machine type in QEMU for paravirt VMs which provided the tap daemon replacement and also a PVFB daemon replacement. The you could kill the entire blktap userspace codebase & most of the PVFB userspace codebase and the libvncserver requirement. So there''d only be 1 single daemon in Dom0 per VM, it would be the same daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, QEMU, VirtualBox) would all be reaping the benefit of each other''s code improvements to QEMU driver model, in particular for disk format code & VNC server code, rather than forking & reimplementing private copies. Of course this isn''t a quick job, but if the motiviation is reducing code duplication & alternative I/O paths, the focusing on QEMU for everything seems like a much more viable idea than more Xen specific code. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2007-Jul-19 22:45 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
> > In the other thread that''s currently going on this topic, it sounds > > like others are quite successfully using the phantom code. Why is it > > broken for you? > > I really can''t see how it works for anybody in 3.1.0 since the code which > sets up phantom devices simply doesn''t workWell let''s fix it then. ;)> > As I''ve said before, I dislike the idea of having separate > > implementations of disks -- one in qemu and one in tapdisk. We''d > > quite like to encourage people to be able to extend virtual block > > devices in the future, and it seems like your approach is going to > > force them to do two independent implementations of things. It also > > leads to complications if you want to add things like caching, shared > > ramdisks, etc. If phantom is broken, why don''t we just fix that? > > AFAICT with or without my change you need to have two separate impls > of every disk format, since the phantom device stuff is only ever used > by blktap - non blktap disks still get processed directly by QEMU.My concern is that it''s possible to run the VM with it only having to depend on a single implementation of a virtual disk. If you don''t use PV drivers, the qemu block drivers do this nicely. If you do, the phantom code lets you do this by ensuring that emulated block requests are redirected to tapdisk (in an admittedly ineffecient, but it doesn''t really matter for the length of time that it happens, way) until the pv drivers come up.> IMHO the entire design & impl of blktap userspace was broken from the > start because it is duplicating functionality already in the QEMU > codebase.Blktap was written before there were device emulated guests and before qemu was capable of processing more than a single outstanding block request at a time. So the only functionality that it duplicated was to use e.g. the vmdk and qcow code as a basis for some of the image file implementations. Vmdk is largely unchanged and I don''t know of anyone who actively uses it, qcow evolved considerably in order to do asynchronous access and batched request processing.> With the benefit of hindsight, I would suggest that it would > be better to have QEMU able to speak the native blktap protocol straight > to the blktap kernel driver. Keep HVM using QEMU for all file backed > disks, since it already handles all the formats just fine, and have a > new machine type in QEMU for paravirt VMs which provided the tap daemon > replacement and also a PVFB daemon replacement. The you could kill the > entire blktap userspace codebase & most of the PVFB userspace codebase > and the libvncserver requirement.I think a patch that pulled a lot of the tapdisk processing into qemu would be a very interesting thing to compare overheads for against the current model.> So there''d only be 1 single daemon in Dom0 per VM, it would be the same > daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, > QEMU, VirtualBox) would all be reaping the benefit of each other''s code > improvements to QEMU driver model, in particular for disk format code & > VNC server code, rather than forking & reimplementing private copies. > > Of course this isn''t a quick job, but if the motiviation is reducing > code duplication & alternative I/O paths, the focusing on QEMU for > everything seems like a much more viable idea than more Xen specific > code.Absolutely. Dan, I completely agree that it would be very good to have a unified way to implement virtual block devices -- image formats, interposition, and otherwise. I think that the qemu and blktap disk interfaces both shared this as an initial design goal. I agree it''s a lot of work and I agree that it would be a very nice thing -- in the same spirit as Rusty''s virtio efforts -- to be able to share these implementations across hypervisors/emulators/etc. I also know of some grad students who would be very happy to see virtual block devices that they are building for blktap apply against everything else. The thing is is that doing everything in qemu doesn''t currently achieve this -- because PV drivers can''t talk directly to qemu and going through the emulated path results in suckful performance. So rather than taking a patch that means PV-based HVM domains have to depend on multiple implementations of disks, I''d much prefer to see us go in the direction of what you propose. a. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2007-Jul-19 22:46 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
> > In the other thread that''s currently going on this topic, it sounds > > like others are quite successfully using the phantom code. Why is it > > broken for you? > > I really can''t see how it works for anybody in 3.1.0 since the code which > sets up phantom devices simply doesn''t workWell let''s fix it then. ;)> > As I''ve said before, I dislike the idea of having separate > > implementations of disks -- one in qemu and one in tapdisk. We''d > > quite like to encourage people to be able to extend virtual block > > devices in the future, and it seems like your approach is going to > > force them to do two independent implementations of things. It also > > leads to complications if you want to add things like caching, shared > > ramdisks, etc. If phantom is broken, why don''t we just fix that? > > AFAICT with or without my change you need to have two separate impls > of every disk format, since the phantom device stuff is only ever used > by blktap - non blktap disks still get processed directly by QEMU.My concern is that it''s possible to run the VM with it only having to depend on a single implementation of a virtual disk. If you don''t use PV drivers, the qemu block drivers do this nicely. If you do, the phantom code lets you do this by ensuring that emulated block requests are redirected to tapdisk (in an admittedly ineffecient, but it doesn''t really matter for the length of time that it happens, way) until the pv drivers come up.> IMHO the entire design & impl of blktap userspace was broken from the > start because it is duplicating functionality already in the QEMU > codebase.Blktap was written before there were device emulated guests and before qemu was capable of processing more than a single outstanding block request at a time. So the only functionality that it duplicated was to use e.g. the vmdk and qcow code as a basis for some of the image file implementations. Vmdk is largely unchanged and I don''t know of anyone who actively uses it, qcow evolved considerably in order to do asynchronous access and batched request processing.> With the benefit of hindsight, I would suggest that it would > be better to have QEMU able to speak the native blktap protocol straight > to the blktap kernel driver. Keep HVM using QEMU for all file backed > disks, since it already handles all the formats just fine, and have a > new machine type in QEMU for paravirt VMs which provided the tap daemon > replacement and also a PVFB daemon replacement. The you could kill the > entire blktap userspace codebase & most of the PVFB userspace codebase > and the libvncserver requirement.I think a patch that pulled a lot of the tapdisk processing into qemu would be a very interesting thing to compare overheads for against the current model.> So there''d only be 1 single daemon in Dom0 per VM, it would be the same > daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, > QEMU, VirtualBox) would all be reaping the benefit of each other''s code > improvements to QEMU driver model, in particular for disk format code & > VNC server code, rather than forking & reimplementing private copies. > > Of course this isn''t a quick job, but if the motiviation is reducing > code duplication & alternative I/O paths, the focusing on QEMU for > everything seems like a much more viable idea than more Xen specific > code.Absolutely. Dan, I completely agree that it would be very good to have a unified way to implement virtual block devices -- image formats, interposition, and otherwise. I think that the qemu and blktap disk interfaces both shared this as an initial design goal. I agree it''s a lot of work and I agree that it would be a very nice thing -- in the same spirit as Rusty''s virtio efforts -- to be able to share these implementations across hypervisors/emulators/etc. I also know of some grad students who would be very happy to see virtual block devices that they are building for blktap apply against everything else. The thing is is that doing everything in qemu doesn''t currently achieve this -- because PV drivers can''t talk directly to qemu and going through the emulated path results in suckful performance. So rather than taking a patch that means PV-based HVM domains have to depend on multiple implementations of disks, I''d much prefer to see us go in the direction of what you propose. a. On 7/19/07, Daniel P. Berrange <berrange@redhat.com> wrote:> On Thu, Jul 19, 2007 at 10:34:12AM -0700, Andrew Warfield wrote: > > So two comments on this: > > > > In the other thread that''s currently going on this topic, it sounds > > like others are quite successfully using the phantom code. Why is it > > broken for you? > > I really can''t see how it works for anybody in 3.1.0 since the code which > sets up phantom devices simply doesn''t work > > try: > imagetype = self.vm.info[''image''][''type''] > except: > imagetype = "" > > if imagetype == ''hvm'': > > The body of that try: statement is trying to read hash keys which don''t > exist, since ''vm.info'' isn''t a hash. So imagetype is always "" and so > none of the phantom setup code ever gets run. Even once fixing that I > never get any devices appearing and the Vm just immediately shuts down. > It seems to be looking for the /dev/xvd* device nodes in Dom0 rather > than DomU which seems rather wrong. > > > As I''ve said before, I dislike the idea of having separate > > implementations of disks -- one in qemu and one in tapdisk. We''d > > quite like to encourage people to be able to extend virtual block > > devices in the future, and it seems like your approach is going to > > force them to do two independent implementations of things. It also > > leads to complications if you want to add things like caching, shared > > ramdisks, etc. If phantom is broken, why don''t we just fix that? > > AFAICT with or without my change you need to have two separate impls > of every disk format, since the phantom device stuff is only ever used > by blktap - non blktap disks still get processed directly by QEMU. Now > if we intend to remove all support for file: entirely, and make blktap > compulsory for file backed VMs then I can see the benefit in having > everything go via one codepath. Though now having 2 userspace daemons > in Dom0 per HVM guest seems like its going in wrong direction to me. > > IMHO the entire design & impl of blktap userspace was broken from the > start because it is duplicating functionality already in the QEMU > codebase. With the benefit of hindsight, I would suggest that it would > be better to have QEMU able to speak the native blktap protocol straight > to the blktap kernel driver. Keep HVM using QEMU for all file backed > disks, since it already handles all the formats just fine, and have a > new machine type in QEMU for paravirt VMs which provided the tap daemon > replacement and also a PVFB daemon replacement. The you could kill the > entire blktap userspace codebase & most of the PVFB userspace codebase > and the libvncserver requirement. > > So there''d only be 1 single daemon in Dom0 per VM, it would be the same > daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, > QEMU, VirtualBox) would all be reaping the benefit of each other''s code > improvements to QEMU driver model, in particular for disk format code & > VNC server code, rather than forking & reimplementing private copies. > > Of course this isn''t a quick job, but if the motiviation is reducing > code duplication & alternative I/O paths, the focusing on QEMU for > everything seems like a much more viable idea than more Xen specific > code. > > Dan. > -- > |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| > |=- Perl modules: http://search.cpan.org/~danberr/ -=| > |=- Projects: http://freshmeat.net/~danielpb/ -=| > |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2007-Jul-20 10:35 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
Andrew Warfield wrote:> As I''ve said before, I dislike the idea of having separate > implementations of disks -- one in qemu and one in tapdisk.The qemu one isn''t going to go away due to qemu being *the* device model for any kind of virtualization in Linux. So if you want to have tapdisk share the code to avoid duplication I see two possible ways to get there: (a) replace blktapd with qemu (b) put the bits into a shared library, which then can be used by qemu & blktapd and other tools (qemu-img, virtual machine management tools, ...). cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2007-Jul-20 13:04 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
Gerd, don''t misunderstand what I''m saying: I''d be delighted to see blktap and qemu share block device implementations. However, the blktap patch that I am commenting on achieves exactly the opposite of that: it *requires* two implementations of any virtual disk type that you want to use PV drivers on in an HVM guest. a. On 7/20/07, Gerd Hoffmann <kraxel@redhat.com> wrote:> Andrew Warfield wrote: > > As I''ve said before, I dislike the idea of having separate > > implementations of disks -- one in qemu and one in tapdisk. > > The qemu one isn''t going to go away due to qemu being *the* device model > for any kind of virtualization in Linux. So if you want to have tapdisk > share the code to avoid duplication I see two possible ways to get there: > > (a) replace blktapd with qemu > (b) put the bits into a shared library, which then can be used by > qemu & blktapd and other tools (qemu-img, virtual machine > management tools, ...). > > cheers, > Gerd > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2007-Jul-20 13:33 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
Andrew Warfield wrote:> Gerd, don''t misunderstand what I''m saying: I''d be delighted to see > blktap and qemu share block device implementations. However, the > blktap patch that I am commenting on achieves exactly the opposite of > that: it *requires* two implementations of any virtual disk type that > you want to use PV drivers on in an HVM guest.Well, the code will be in qemu anyway because other people use it, no matter whenever the xenified qemu device model actually uses it or not (with the phantom device redirection trick). So if you want to get rid of code duplication either shared libs or redirecting things the other way around (i.e. use qemu code for both hvm and pv) will work. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2007-Jul-20 14:31 UTC
Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
On Thu, Jul 19, 2007 at 03:45:26PM -0700, Andrew Warfield wrote:> >> In the other thread that''s currently going on this topic, it sounds > >> like others are quite successfully using the phantom code. Why is it > >> broken for you? > > > >I really can''t see how it works for anybody in 3.1.0 since the code which > >sets up phantom devices simply doesn''t work > > Well let''s fix it then. ;)Ok, I''ll try and figure out what''s broken. We still need a patch to make QEMU watch out for disks named ''xvd*'' though, since upstream paravirt drivers only support xvd* naming.> >With the benefit of hindsight, I would suggest that it would > >be better to have QEMU able to speak the native blktap protocol straight > >to the blktap kernel driver. Keep HVM using QEMU for all file backed > >disks, since it already handles all the formats just fine, and have a > >new machine type in QEMU for paravirt VMs which provided the tap daemon > >replacement and also a PVFB daemon replacement. The you could kill the > >entire blktap userspace codebase & most of the PVFB userspace codebase > >and the libvncserver requirement. > > I think a patch that pulled a lot of the tapdisk processing into qemu > would be a very interesting thing to compare overheads for against the > current model. > > >So there''d only be 1 single daemon in Dom0 per VM, it would be the same > >daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, > >QEMU, VirtualBox) would all be reaping the benefit of each other''s code > >improvements to QEMU driver model, in particular for disk format code & > >VNC server code, rather than forking & reimplementing private copies. > > > >Of course this isn''t a quick job, but if the motiviation is reducing > >code duplication & alternative I/O paths, the focusing on QEMU for > >everything seems like a much more viable idea than more Xen specific > >code. > > Absolutely. Dan, I completely agree that it would be very good to > have a unified way to implement virtual block devices -- image > formats, interposition, and otherwise. I think that the qemu and > blktap disk interfaces both shared this as an initial design goal. I > agree it''s a lot of work and I agree that it would be a very nice > thing -- in the same spirit as Rusty''s virtio efforts -- to be able to > share these implementations across hypervisors/emulators/etc. I also > know of some grad students who would be very happy to see virtual > block devices that they are building for blktap apply against > everything else.Thinking about it a bit more broadly - considering differences between paravirt & fullyvirt. With paravirt ops we''re nearly able to have a single kernel image. VirtIO work will hopefully make a single set of paravirt drivers for disk/network/etc. With this HVM ought to be getting pretty near parity with paravirt in terms of performance. The primary compelling thing left in favour of paravirt is no reliance of hardware support. At the same time paravirt has some really bad downsides, in particular the terrible bootloader process with hacks of pygrub & pypxeboot. The PV framebuffer is severely limited compared to Cirrus, and requires a almost completely duplicated VNC server impl. Blktap is a little more complicated, but there''s general codeduplication wrt to QEMU we''ve discussed above. Then there are things like lack of emulated USB bus, an emulated CDROM device, and other misc hardware devices that QEMU provides. I think it would be an interesting project to see if one could make a QEMU machine which allows Xen paravirt guests to be booted using QEMU (and thus the regular grub, viewable through the regulard graphical VNC server), and provide the QEMU emulated device model to the guest to enable USB, etc, though still primarily using paravirt drivers where available for speed of course. Basically I''d like to have complete parity in device models & boot process between paravirt & HVM guests, so I stop having to tell users "well you can do X in HVM, but not with paravirt" & vica-verca. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel