thr3ads.net - Libguestfs - Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2019-Apr-12 15:30 UTC

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

On Fri, Apr 12, 2019 at 03:52:58PM +0200, Martin Kletzander
wrote:> On Thu, Apr 11, 2019 at 06:55:55PM +0100, Richard W.M. Jones wrote:
> >This seems to be a bug in VDDK, possibly connected with the fact that
> >we fork after initializing VDDK but before doing the
> >VixDiskLib_ConnectEx.  I suspect it's something to do with the PID
> >changing.
> >
> >It would be fair to deduct 2 minutes from all timings below.
> >
>
> Is the PID changed because you want to exec from the parent (where
> the init is done), but all the other calls are done in the child? Is
> that the case so that nbdkit is part of the process that someone
> spawned? I'm asking just to know if something can be done about it.
This hang only applies when using the --run option and I guess you
wouldn't be using that option so you wouldn't see the hang.

The reason why nbdkit forks itself when using this option is so we end
up with a situation like this:

  +-- nbdkit monitoring process
    |
    +-- first child = nbdkit
    |
    +-- second child = ‘--run’ command

so when the second child exits, the monitoring process (which is doing
nothing except waiting for the second child to exit) can kill nbdkit.

If VDDK cannot handle this situation (and I'm just guessing that this
is the bug) then VDDK has a bug.
> >(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
> >nicely measure the benefits of extents:
> >
> >With noextents (ie. force full copy):
> >
> > elapsed time: 323.815 s
> > read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
> >
> >Without noextents (ie. rely on qemu-img skipping sparse bits):
> >
> > elapsed time: 237.41 s
> > read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
> > extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
> >
> >Note if you deduct 120 seconds (see point (1) above) from these times
> >then it goes from 203s -> 117s, about a 40% saving.  We can likely
do
> >better by having > 32 bit requests and qemu not using
> >NBD_CMD_FLAG_REQ_ONE.
> >
> How did you run qemu-img?
The full command was:

LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk" \
                      libdir=vmware-vix-disklib-distrib \
                      server=vmware user=root password=+/tmp/passwd \
                      thumbprint=xyz \
                      vm=moref=3 \
                      --filter=stats statsfile=/dev/stderr \
                      --run '
        unset LD_LIBRARY_PATH
        /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
    '

(with extra filters added to the command line as appropriate for each
test).
> I think on slow CPU and fast disk this might be even bigger
> difference if qemu-img can write whatever it gets and not searching
> for zeros.
This is RHEL 8 so /var/tmp is XFS.  The hardware is relatively new and
the disk is an SSD.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Martin Kletzander

2019-Apr-15 14:30 UTC

head link

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

On Fri, Apr 12, 2019 at 04:30:02PM +0100, Richard W.M. Jones
wrote:>On Fri, Apr 12, 2019 at 03:52:58PM +0200, Martin Kletzander wrote:
>> On Thu, Apr 11, 2019 at 06:55:55PM +0100, Richard W.M. Jones wrote:
>> >This seems to be a bug in VDDK, possibly connected with the fact
that
>> >we fork after initializing VDDK but before doing the
>> >VixDiskLib_ConnectEx.  I suspect it's something to do with the
PID
>> >changing.
>> >
>> >It would be fair to deduct 2 minutes from all timings below.
>> >
>>
>> Is the PID changed because you want to exec from the parent (where
>> the init is done), but all the other calls are done in the child? Is
>> that the case so that nbdkit is part of the process that someone
>> spawned? I'm asking just to know if something can be done about it.
>
>This hang only applies when using the --run option and I guess you
>wouldn't be using that option so you wouldn't see the hang.
>
>The reason why nbdkit forks itself when using this option is so we end
>up with a situation like this:
>
>  +-- nbdkit monitoring process
>    |
>    +-- first child = nbdkit
>    |
>    +-- second child = ‘--run’ command
>
>so when the second child exits, the monitoring process (which is doing
>nothing except waiting for the second child to exit) can kill nbdkit.
>
Oh, I thought the "monitoring process" would just be a signal handler.
If the
monitoring process is just checking those two underlying ones, how come the PID
changes for the APIs?  Is the Init called before the first child forks off?
>If VDDK cannot handle this situation (and I'm just guessing that this
>is the bug) then VDDK has a bug.
>
Sure, but having a workaround could be nice, if it's not too much work.
>> >(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
>> >nicely measure the benefits of extents:
>> >
>> >With noextents (ie. force full copy):
>> >
>> > elapsed time: 323.815 s
>> > read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
>> >
>> >Without noextents (ie. rely on qemu-img skipping sparse bits):
>> >
>> > elapsed time: 237.41 s
>> > read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
>> > extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
>> >
>> >Note if you deduct 120 seconds (see point (1) above) from these
times
>> >then it goes from 203s -> 117s, about a 40% saving.  We can
likely do
>> >better by having > 32 bit requests and qemu not using
>> >NBD_CMD_FLAG_REQ_ONE.
>> >
>> How did you run qemu-img?
>
>The full command was:
>
>LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
>./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk"
\
>                      libdir=vmware-vix-disklib-distrib \
>                      server=vmware user=root password=+/tmp/passwd \
>                      thumbprint=xyz \
>                      vm=moref=3 \
>                      --filter=stats statsfile=/dev/stderr \
>                      --run '
>        unset LD_LIBRARY_PATH
>        /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
>    '
>
>(with extra filters added to the command line as appropriate for each
>test).
>
>> I think on slow CPU and fast disk this might be even bigger
>> difference if qemu-img can write whatever it gets and not searching
>> for zeros.
>
>This is RHEL 8 so /var/tmp is XFS.  The hardware is relatively new and
>the disk is an SSD.
>
Why I'm asking is because what you are measuring above still includes QEMU
looking for zero blocks in the data.  I haven't found a way to make qemu
write
the sparse data it reads without explicitly sparsifying even more by checking
for zeros and not creating a fully allocated image.

I have to look into that as that would be very useful for my use case.
>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-df lists disk usage of guests without needing to install any
>software inside the virtual machine.  Supports Linux and Windows.
>http://people.redhat.com/~rjones/virt-df/

Richard W.M. Jones

2019-Apr-15 14:49 UTC

head link

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

On Mon, Apr 15, 2019 at 04:30:47PM +0200, Martin Kletzander
wrote:> On Fri, Apr 12, 2019 at 04:30:02PM +0100, Richard W.M. Jones wrote:
> > +-- nbdkit monitoring process
> >   |
> >   +-- first child = nbdkit
> >   |
> >   +-- second child = ‘--run’ command
> >
> >so when the second child exits, the monitoring process (which is doing
> >nothing except waiting for the second child to exit) can kill nbdkit.
> >
>
> Oh, I thought the "monitoring process" would just be a signal
> handler.  If the monitoring process is just checking those two
> underlying ones, how come the PID changes for the APIs?  Is the Init
> called before the first child forks off?
Right, for convenience reasons the configuration steps (ie. .config,
.config_complete in [1]) are done before we fork either to act as a
server or to run commands, and the VDDK plugin does the initialization
in .config_complete which is the only sensible place to do it.

While this is specific to using the --run option, it would also I
assume happen if nbdkit forks into the background to become a server.
But if you run nbdkit without --run and with --foreground then it
remains in the foreground and the hang doesn't occur.

[1] https://github.com/libguestfs/nbdkit/blob/master/docs/nbdkit-plugin.pod
> >If VDDK cannot handle this situation (and I'm just guessing that
this
> >is the bug) then VDDK has a bug.
> >
> 
> Sure, but having a workaround could be nice, if it's not too much work.
Patches welcome, but I suspect there's not a lot we can do in nbdkit
> >>>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we
can
> >>>nicely measure the benefits of extents:
> >>>
> >>>With noextents (ie. force full copy):
> >>>
> >>> elapsed time: 323.815 s
> >>> read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
> >>>
> >>>Without noextents (ie. rely on qemu-img skipping sparse bits):
> >>>
> >>> elapsed time: 237.41 s
> >>> read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
> >>> extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
> >>>
> >>>Note if you deduct 120 seconds (see point (1) above) from these
times
> >>>then it goes from 203s -> 117s, about a 40% saving.  We can
likely do
> >>>better by having > 32 bit requests and qemu not using
> >>>NBD_CMD_FLAG_REQ_ONE.
> >>>
> >>How did you run qemu-img?
> >
> >The full command was:
> >
> >LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
> >./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora
28.vmdk" \
> >                     libdir=vmware-vix-disklib-distrib \
> >                     server=vmware user=root password=+/tmp/passwd \
> >                     thumbprint=xyz \
> >                     vm=moref=3 \
> >                     --filter=stats statsfile=/dev/stderr \
> >                     --run '
> >       unset LD_LIBRARY_PATH
> >       /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
> >   '
> >
> >(with extra filters added to the command line as appropriate for each
> >test).
> >
> >>I think on slow CPU and fast disk this might be even bigger
> >>difference if qemu-img can write whatever it gets and not searching
> >>for zeros.
> >
> >This is RHEL 8 so /var/tmp is XFS.  The hardware is relatively new and
> >the disk is an SSD.
> >
>
> Why I'm asking is because what you are measuring above still
> includes QEMU looking for zero blocks in the data.  I haven't found
> a way to make qemu write the sparse data it reads without explicitly
> sparsifying even more by checking for zeros and not creating a fully
> allocated image.
While qemu-img is still trying to detect zeroes, it won't find too
many because the image is thin provisioned.  However I take your point
that when copying a snapshot using the "single link" flag you
don't
want qemu-img to do this because that means it may omit parts of the
snapshot that happen to be zero.  It would still be good to see the
output of ‘qemu-img map --output=json’ to see if qemu is really
sparsifying the zeroes or is actually writing them as zero non-holes
(which is IMO correct behaviour and shouldn't cause any problem).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Apparently Analagous Threads

Search for more possibly parallel threads

Libguestfs - Apr 2019 - Re: nbdkit, VDDK, extents, readahead, etc

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

Re: [Libguestfs] nbdkit, VDDK, extents, readahead, etc

Apparently Analagous Threads