Richard W.M. Jones
2019-Apr-11 17:55 UTC
[Libguestfs] nbdkit, VDDK, extents, readahead, etc
As I've spent really too long today investigating this, I want to document this in a public email, even though there's nothing really that interesting here. One thing you find from search for VDD 6.7 / VixDiskLib_QueryAllocatedBlocks issues with Google is that we must be one of the very few users out there. And the other thing is that it's quite broken. All testing was done using two baremetal servers connected back to back through a gigabit ethernet switch. I used upstream qemu and nbdkit from git as of today. I used a single test Fedora guest with a 16G thin provisioned disk with about 1.6G allocated. Observations: (1) VDDK hangs for a really long time when using the nbdkit --run option. It specifically hangs for exactly 120 seconds doing: nbdkit: debug: VixDiskLib: Resolve host. This seems to be a bug in VDDK, possibly connected with the fact that we fork after initializing VDDK but before doing the VixDiskLib_ConnectEx. I suspect it's something to do with the PID changing. It would be fair to deduct 2 minutes from all timings below. (2) VDDK cannot use VixDiskLib_QueryAllocatedBlocks if the disk is opened for writes. It fails with this uninformative error: nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 13 from server: NfcFssrvrOpen: Failed to open '[datastore1] Fedora 28/Fedora 28.vmdk' nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrClientOpen: received unexpected message 4 from server nbdkit: vddk[1]: debug: VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC). nbdkit: vddk[1]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 543. nbdkit: vddk[1]: debug: can_extents: VixDiskLib_QueryAllocatedBlocks test failed, extents support will be disabled: original error: Unknown error The last debug statement is from nbdkit itself indicating that because VixDiskLib_QueryAllocatedBlocks didn't work, extents support is disabled. To work around this you can use nbdkit --readonly. However I don't understand why that would be necessary, except perhaps it's just an undocumented limitation of VDDK. For all the cases _we_ care about we're using --readonly, so that's lucky. (3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can nicely measure the benefits of extents: With noextents (ie. force full copy): elapsed time: 323.815 s read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s Without noextents (ie. rely on qemu-img skipping sparse bits): elapsed time: 237.41 s read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s Note if you deduct 120 seconds (see point (1) above) from these times then it goes from 203s -> 117s, about a 40% saving. We can likely do better by having > 32 bit requests and qemu not using NBD_CMD_FLAG_REQ_ONE. (4) We can also add nbdkit-readahead-filter in both cases to see if that helps or not: With noextents and readahead: elapsed time: 325.358 s read: 265 ops, 17179869184 bytes, 4.22423e+08 bits/s As expected the readahead filter reduces the numbers of iops greatly. But in this back-to-back configuration VDDK requests are relatively cheap so no time is saved. Without noextents, with readahead: elapsed time: 252.608 s read: 96 ops, 1927282688 bytes, 6.10363e+07 bits/s extents: 70 ops, 135654246400 bytes, 4.29612e+09 bits/s Readahead is detrimental in this case, as expected because this filter works best when reads are purely sequential, and if not it will tend to prefetch extra data. Notice that the number of bytes read is larger here than in the earlier test. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Nikolay Ivanets
2019-Apr-11 18:04 UTC
Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc
Great investigation. Thanks for sharing! -- Mykola Ivanets чт, 11 квіт. 2019, 20:56 користувач Richard W.M. Jones <rjones@redhat.com> пише:> As I've spent really too long today investigating this, I want to > document this in a public email, even though there's nothing really > that interesting here. One thing you find from search for VDD 6.7 / > VixDiskLib_QueryAllocatedBlocks issues with Google is that we must be > one of the very few users out there. And the other thing is that it's > quite broken. > > All testing was done using two baremetal servers connected back to > back through a gigabit ethernet switch. I used upstream qemu and > nbdkit from git as of today. I used a single test Fedora guest with a > 16G thin provisioned disk with about 1.6G allocated. > > Observations: > > (1) VDDK hangs for a really long time when using the nbdkit --run > option. > > It specifically hangs for exactly 120 seconds doing: > > nbdkit: debug: VixDiskLib: Resolve host. > > This seems to be a bug in VDDK, possibly connected with the fact that > we fork after initializing VDDK but before doing the > VixDiskLib_ConnectEx. I suspect it's something to do with the PID > changing. > > It would be fair to deduct 2 minutes from all timings below. > > (2) VDDK cannot use VixDiskLib_QueryAllocatedBlocks if the disk is > opened for writes. It fails with this uninformative error: > > nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received > NFC error 13 from server: NfcFssrvrOpen: Failed to open '[datastore1] > Fedora 28/Fedora 28.vmdk' > nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrClientOpen: received > unexpected message 4 from server > nbdkit: vddk[1]: debug: VixDiskLib: Detected DiskLib error 290 > (NBD_ERR_GENERIC). > nbdkit: vddk[1]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to > start query process. Error 1 (Unknown error) (DiskLib error 290: > NBD_ERR_GENERIC) at 543. > nbdkit: vddk[1]: debug: can_extents: VixDiskLib_QueryAllocatedBlocks > test failed, extents support will be disabled: original error: Unknown error > > The last debug statement is from nbdkit itself indicating that because > VixDiskLib_QueryAllocatedBlocks didn't work, extents support is > disabled. > > To work around this you can use nbdkit --readonly. However I don't > understand why that would be necessary, except perhaps it's just an > undocumented limitation of VDDK. For all the cases _we_ care about > we're using --readonly, so that's lucky. > > (3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can > nicely measure the benefits of extents: > > With noextents (ie. force full copy): > > elapsed time: 323.815 s > read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s > > Without noextents (ie. rely on qemu-img skipping sparse bits): > > elapsed time: 237.41 s > read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s > extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s > > Note if you deduct 120 seconds (see point (1) above) from these times > then it goes from 203s -> 117s, about a 40% saving. We can likely do > better by having > 32 bit requests and qemu not using > NBD_CMD_FLAG_REQ_ONE. > > (4) We can also add nbdkit-readahead-filter in both cases to see if > that helps or not: > > With noextents and readahead: > > elapsed time: 325.358 s > read: 265 ops, 17179869184 bytes, 4.22423e+08 bits/s > > As expected the readahead filter reduces the numbers of iops greatly. > But in this back-to-back configuration VDDK requests are relatively > cheap so no time is saved. > > Without noextents, with readahead: > > elapsed time: 252.608 s > read: 96 ops, 1927282688 bytes, 6.10363e+07 bits/s > extents: 70 ops, 135654246400 bytes, 4.29612e+09 bits/s > > Readahead is detrimental in this case, as expected because this filter > works best when reads are purely sequential, and if not it will tend > to prefetch extra data. Notice that the number of bytes read is > larger here than in the earlier test. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat > http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-df lists disk usage of guests without needing to install any > software inside the virtual machine. Supports Linux and Windows. > http://people.redhat.com/~rjones/virt-df/ > > _______________________________________________ > Libguestfs mailing list > Libguestfs@redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs >
Martin Kletzander
2019-Apr-12 13:52 UTC
Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc
On Thu, Apr 11, 2019 at 06:55:55PM +0100, Richard W.M. Jones wrote:>As I've spent really too long today investigating this, I want to >document this in a public email, even though there's nothing really >that interesting here. One thing you find from search for VDD 6.7 / >VixDiskLib_QueryAllocatedBlocks issues with Google is that we must be >one of the very few users out there. And the other thing is that it's >quite broken. > >All testing was done using two baremetal servers connected back to >back through a gigabit ethernet switch. I used upstream qemu and >nbdkit from git as of today. I used a single test Fedora guest with a >16G thin provisioned disk with about 1.6G allocated. > >Observations: > >(1) VDDK hangs for a really long time when using the nbdkit --run > option. > >It specifically hangs for exactly 120 seconds doing: > > nbdkit: debug: VixDiskLib: Resolve host. > >This seems to be a bug in VDDK, possibly connected with the fact that >we fork after initializing VDDK but before doing the >VixDiskLib_ConnectEx. I suspect it's something to do with the PID >changing. > >It would be fair to deduct 2 minutes from all timings below. >Is the PID changed because you want to exec from the parent (where the init is done), but all the other calls are done in the child? Is that the case so that nbdkit is part of the process that someone spawned? I'm asking just to know if something can be done about it.>(2) VDDK cannot use VixDiskLib_QueryAllocatedBlocks if the disk is >opened for writes. It fails with this uninformative error: > > nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 13 from server: NfcFssrvrOpen: Failed to open '[datastore1] Fedora 28/Fedora 28.vmdk' > nbdkit: vddk[1]: error: [NFC ERROR] NfcFssrvrClientOpen: received unexpected message 4 from server > nbdkit: vddk[1]: debug: VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC). > nbdkit: vddk[1]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 543. > nbdkit: vddk[1]: debug: can_extents: VixDiskLib_QueryAllocatedBlocks test failed, extents support will be disabled: original error: Unknown error > >The last debug statement is from nbdkit itself indicating that because >VixDiskLib_QueryAllocatedBlocks didn't work, extents support is >disabled. > >To work around this you can use nbdkit --readonly. However I don't >understand why that would be necessary, except perhaps it's just an >undocumented limitation of VDDK. For all the cases _we_ care about >we're using --readonly, so that's lucky. >It might've been a safe measure for multiple accesses or something similar. Or "we'll implement that later" symptome.>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can >nicely measure the benefits of extents: > >With noextents (ie. force full copy): > > elapsed time: 323.815 s > read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s > >Without noextents (ie. rely on qemu-img skipping sparse bits): > > elapsed time: 237.41 s > read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s > extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s > >Note if you deduct 120 seconds (see point (1) above) from these times >then it goes from 203s -> 117s, about a 40% saving. We can likely do >better by having > 32 bit requests and qemu not using >NBD_CMD_FLAG_REQ_ONE. >How did you run qemu-img? I think on slow CPU and fast disk this might be even bigger difference if qemu-img can write whatever it gets and not searching for zeros.>(4) We can also add nbdkit-readahead-filter in both cases to see if >that helps or not: > >With noextents and readahead: > > elapsed time: 325.358 s > read: 265 ops, 17179869184 bytes, 4.22423e+08 bits/s > >As expected the readahead filter reduces the numbers of iops greatly. >But in this back-to-back configuration VDDK requests are relatively >cheap so no time is saved. > >Without noextents, with readahead: > > elapsed time: 252.608 s > read: 96 ops, 1927282688 bytes, 6.10363e+07 bits/s > extents: 70 ops, 135654246400 bytes, 4.29612e+09 bits/s > >Readahead is detrimental in this case, as expected because this filter >works best when reads are purely sequential, and if not it will tend >to prefetch extra data. Notice that the number of bytes read is >larger here than in the earlier test. >Really good write-up, thanks for sharing.>Rich. > >-- >Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones >Read my programming and virtualization blog: http://rwmj.wordpress.com >virt-df lists disk usage of guests without needing to install any >software inside the virtual machine. Supports Linux and Windows. >http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-Apr-12 15:30 UTC
Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc
On Fri, Apr 12, 2019 at 03:52:58PM +0200, Martin Kletzander wrote:> On Thu, Apr 11, 2019 at 06:55:55PM +0100, Richard W.M. Jones wrote: > >This seems to be a bug in VDDK, possibly connected with the fact that > >we fork after initializing VDDK but before doing the > >VixDiskLib_ConnectEx. I suspect it's something to do with the PID > >changing. > > > >It would be fair to deduct 2 minutes from all timings below. > > > > Is the PID changed because you want to exec from the parent (where > the init is done), but all the other calls are done in the child? Is > that the case so that nbdkit is part of the process that someone > spawned? I'm asking just to know if something can be done about it.This hang only applies when using the --run option and I guess you wouldn't be using that option so you wouldn't see the hang. The reason why nbdkit forks itself when using this option is so we end up with a situation like this: +-- nbdkit monitoring process | +-- first child = nbdkit | +-- second child = ‘--run’ command so when the second child exits, the monitoring process (which is doing nothing except waiting for the second child to exit) can kill nbdkit. If VDDK cannot handle this situation (and I'm just guessing that this is the bug) then VDDK has a bug.> >(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can > >nicely measure the benefits of extents: > > > >With noextents (ie. force full copy): > > > > elapsed time: 323.815 s > > read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s > > > >Without noextents (ie. rely on qemu-img skipping sparse bits): > > > > elapsed time: 237.41 s > > read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s > > extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s > > > >Note if you deduct 120 seconds (see point (1) above) from these times > >then it goes from 203s -> 117s, about a 40% saving. We can likely do > >better by having > 32 bit requests and qemu not using > >NBD_CMD_FLAG_REQ_ONE. > > > How did you run qemu-img?The full command was: LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \ ./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk" \ libdir=vmware-vix-disklib-distrib \ server=vmware user=root password=+/tmp/passwd \ thumbprint=xyz \ vm=moref=3 \ --filter=stats statsfile=/dev/stderr \ --run ' unset LD_LIBRARY_PATH /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out ' (with extra filters added to the command line as appropriate for each test).> I think on slow CPU and fast disk this might be even bigger > difference if qemu-img can write whatever it gets and not searching > for zeros.This is RHEL 8 so /var/tmp is XFS. The hardware is relatively new and the disk is an SSD. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/