Hi, recently we (oVirt) have started discussing whether the way virt-v2v handles import from OVA files is good. And I would be interested in ideas how it can be improved. It is likely somebody already gave some thought to this problem. TL;DR: Extracting the OVA before import is a problem for large VMs (in sizes of TBs). Can we change something to prevent the extraction and work directly over OVA? What we consider a huge shortcoming is the fact that whole OVA is extracted prior to the import into a temporary directory and processed afterwards. Under normal situation user can have up to three copies of the VM on his drive at the end of import: * original OVA, * temporary extracted files (will be deleted when virt-v2v terminates, * converted VM. This is not a good idea for large VMs that have hunderds of GBs or even TBs in size. The requirements on the necessary storage space can be lessened with proper partitioning. I.e. source OVA and converted VM don't end up on the same drive and TMPDIR is set to put even temporary files somewhere else. But this is not a general solution. And sometimes the necessary space may not be available at all. The question is how to change the import path so that virt-v2v doesn't have to extract the OVA. I can see the following solutions: 1) Solve it virt-v2v: create a layer for directly accessing the files in the archive. 2) Solve it in QEMU: create backing method that would allow creating qemu disk backed by the archive. 3) Solve it on oVirt side: use some FUSE-based tool to provide access to the archive and pass the OVA to virt-v2v not as a file but as directory. Does anyone have any other ideas or suggestions? Best regards, Tomas Golembiovsky -- Tomáš Golembiovský <tgolembi@redhat.com>
Richard W.M. Jones
2016-Sep-09 12:02 UTC
Re: [Libguestfs] Extracting files from OVA is bad
On Fri, Sep 09, 2016 at 01:03:49PM +0200, Tomáš Golembiovský wrote:> Hi, > > recently we (oVirt) have started discussing whether the way virt-v2v > handles import from OVA files is good. And I would be interested in > ideas how it can be improved. It is likely somebody already gave some > thought to this problem. > > TL;DR: Extracting the OVA before import is a problem for large VMs (in > sizes of TBs). Can we change something to prevent the extraction and > work directly over OVA?Specifically virt-v2v needs to do: qemu-img create -b <source-file-within-the-tarball> -f qcow2 overlay.qcow2 qemu-img convert overlay.qcow2 output> What we consider a huge shortcoming is the fact that whole OVA is > extracted prior to the import into a temporary directory and processed > afterwards. Under normal situation user can have up to three copies of > the VM on his drive at the end of import: > > * original OVA, > * temporary extracted files (will be deleted when virt-v2v terminates, > * converted VM. > > > This is not a good idea for large VMs that have hunderds of GBs or even > TBs in size. The requirements on the necessary storage space can be > lessened with proper partitioning. I.e. source OVA and converted VM > don't end up on the same drive and TMPDIR is set to put even temporary > files somewhere else. But this is not a general solution. And sometimes > the necessary space may not be available at all. > > > The question is how to change the import path so that virt-v2v doesn't > have to extract the OVA. I can see the following solutions: > > 1) Solve it virt-v2v: create a layer for directly accessing the files > in the archive. > > 2) Solve it in QEMU: create backing method that would allow creating > qemu disk backed by the archive.As long as the tar file is not compressed, accessing a file within it should be trivial. I asked Kevin if there is a way to get qemu to access a disk image at an offset within another file, but there is no such feature at the moment. It's possible with `losetup', but that requires root :-( (At this point I would normally grumble about how easy this would be with a microkernel, but I won't do that now.) David Gilbert suggested looking at qemu-nbd which has an --offset option, allowing a particular offset with another file to be accessed. If we wanted to do it entirely within virt-v2v, I think this would be the way to go - the complex logic could be hidden inside v2v/input_ova.ml The second problem is to work out the right offset to use. I suspect this is something that http://www.libarchive.org/ can do, and that package is also in RHEL. We could even imagine a qemu block backend based on libarchive.> 3) Solve it on oVirt side: use some FUSE-based tool to provide > access to the archive and pass the OVA to virt-v2v not as a file but > as directory.http://www.cybernoia.de/software/archivemount/ is one such tool which can do this. It's not in RHEL, but it seems to be based on libarchive. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
> On 09 Sep 2016, at 14:02, Richard W.M. Jones <rjones@redhat.com> wrote: > > On Fri, Sep 09, 2016 at 01:03:49PM +0200, Tomáš Golembiovský wrote: >> Hi, >> >> recently we (oVirt) have started discussing whether the way virt-v2v >> handles import from OVA files is good. And I would be interested in >> ideas how it can be improved. It is likely somebody already gave some >> thought to this problem. >> >> TL;DR: Extracting the OVA before import is a problem for large VMs (in >> sizes of TBs). Can we change something to prevent the extraction and >> work directly over OVA? > > Specifically virt-v2v needs to do: > > qemu-img create -b <source-file-within-the-tarball> -f qcow2 overlay.qcow2 > qemu-img convert overlay.qcow2 output > >> What we consider a huge shortcoming is the fact that whole OVA is >> extracted prior to the import into a temporary directory and processed >> afterwards. Under normal situation user can have up to three copies of >> the VM on his drive at the end of import: >> >> * original OVA, >> * temporary extracted files (will be deleted when virt-v2v terminates, >> * converted VM. >> >> >> This is not a good idea for large VMs that have hunderds of GBs or even >> TBs in size. The requirements on the necessary storage space can be >> lessened with proper partitioning. I.e. source OVA and converted VM >> don't end up on the same drive and TMPDIR is set to put even temporary >> files somewhere else. But this is not a general solution. And sometimes >> the necessary space may not be available at all. >> >> >> The question is how to change the import path so that virt-v2v doesn't >> have to extract the OVA. I can see the following solutions: >> >> 1) Solve it virt-v2v: create a layer for directly accessing the files >> in the archive. >> >> 2) Solve it in QEMU: create backing method that would allow creating >> qemu disk backed by the archive. > > As long as the tar file is not compressed, accessing a file within it > should be trivial.The OVA standard [1] talks about compression. But it looks like it’s meant only for individual disks inside the archive. It doesn’t seem to be clear about it. The OVF xml is guaranteed to be at the beginning so it doesn’t need to read a lot until really reading it whole even if it would be compressed. Well, I guess it’s a reasonable to start with plain tar regardless. Thanks, michal [1] http://www.dmtf.org/sites/default/files/standards/documents/DSP0243_2.1.1.pdf> > I asked Kevin if there is a way to get qemu to access a disk image at > an offset within another file, but there is no such feature at the > moment. It's possible with `losetup', but that requires root :-( > > (At this point I would normally grumble about how easy this would be > with a microkernel, but I won't do that now.) > > David Gilbert suggested looking at qemu-nbd which has an --offset > option, allowing a particular offset with another file to be accessed. > If we wanted to do it entirely within virt-v2v, I think this would be > the way to go - the complex logic could be hidden inside v2v/input_ova.ml > > The second problem is to work out the right offset to use. I suspect > this is something that http://www.libarchive.org/ can do, and that > package is also in RHEL. > > We could even imagine a qemu block backend based on libarchive. > >> 3) Solve it on oVirt side: use some FUSE-based tool to provide >> access to the archive and pass the OVA to virt-v2v not as a file but >> as directory. > > http://www.cybernoia.de/software/archivemount/ is one such tool which > can do this. It's not in RHEL, but it seems to be based on > libarchive. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > libguestfs lets you edit virtual machines. Supports shell scripting, > bindings from many languages. http://libguestfs.org