thr3ads.net - Libguestfs - [Libguestfs] Scaling virt-df performance [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Dan Ryder (daryder)

2014-Sep-10 13:38 UTC

[Libguestfs] Scaling virt-df performance

Hello,

I have been looking at the "virt-df" libguestfs tool to get
guest-level disk used/free statistics - specifically with Qemu-KVM/Openstack.
This works great for a few Openstack instances, but when I begin to scale (even
to ~30 instances/guests) the performance really takes a hit. The time it takes
for the command to complete seems to scale linearly with the amount of
guests/domains running on the hypervisor (note - I am using "virt-df"
for all guests, not specifying one at a time; although I've tried that,
too).

For ~30 guests, the "virt-df" command takes around 90 seconds to
complete. We are looking to support a scale of 3,000-30,000 guests disk
used/free. It looks like this won't be remotely possible using
"virt-df".


Has anyone run into this same problem or something similar and can give me some
suggestions to find a workaround?


Thanks,
Dan Ryder

Richard W.M. Jones

2014-Sep-10 16:31 UTC

head link

Re: [Libguestfs] Scaling virt-df performance

On Wed, Sep 10, 2014 at 01:38:16PM +0000, Dan Ryder (daryder)
wrote:> Hello,
>
> I have been looking at the "virt-df" libguestfs tool to get
> guest-level disk used/free statistics - specifically with
> Qemu-KVM/Openstack. This works great for a few Openstack instances,
> but when I begin to scale (even to ~30 instances/guests) the
> performance really takes a hit. The time it takes for the command to
> complete seems to scale linearly with the amount of guests/domains
> running on the hypervisor (note - I am using "virt-df" for all
> guests, not specifying one at a time; although I've tried that,
> too).
>
> For ~30 guests, the "virt-df" command takes around 90 seconds to
> complete. We are looking to support a scale of 3,000-30,000 guests
> disk used/free. It looks like this won't be remotely possible using
> "virt-df".
With sufficient memory, non-nested, on hardware built in the last 3
years, you should get performance of about 1 second / guest
(pipelined).  So the figure you give is about 3 times too high than
where it should be.

Just to get some basic things out of the way:

- What version of virt-df are you using and on what distro?

- Is this nested?

- What exact virt-df command(s) are you running?

- Are you using the -P option?

- How much free memory is on the system?  virt-df runs multiple
  threads in parallel, but the number to run is computed according to
  the amount of free memory[1].

Also have a look at the guestfs-performance manual[2].  I'm especially
interested in the results of the baseline measurements on that page,
but the rest of the page should answer some of your questions too.
There's also an interesting Perl script you might try playing with.

Rich.

[1]
https://github.com/libguestfs/libguestfs/blob/master/df/estimate-max-threads.c
[2] http://libguestfs.org/guestfs-performance.1.html


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Dan Ryder (daryder)

2014-Sep-10 19:35 UTC

head link

Re: [Libguestfs] Scaling virt-df performance

Hi Richard,

Thanks a lot for the response - very helpful to go through.

I'm using libguestfs 1.26.5 on an Ubuntu 14.04 OS, which is running as a
baremetal server. I was simply using "virt-df", but after looking into
the "-P" option a little more I have incorporated it. That greatly
improves the performance compared to the original (I checked the free memory
available and it ranged from 2-3G, which wouldn't have given me as many
parallel threads running). Question about this - is it safe to enforce a larger
threadcount without checking available memory first?

I ran the baselines and got the following:
	Starting the appliance = ~3.4s
	Performing inspection of a guest = ~5s

I also looked at your blog posts - very interesting stuff. I played with setting
"LIBGUESTFS_ATTACH_METHOD=appliance", but didn't notice much
difference here. I'm testing on a Qemu-KVM/Openstack with 29 guests running.
KVM has the default client connections as 20, so I expected to see a little
improvement when setting the above env var. After setting, I changed
"-P" to 29 instead of 20, but didn't see any difference.

Any additional suggestions as the scale dramatically increases to > 3000
guests (This will likely be on a system with much more available memory)?
Ideally we would like to gather guest disk used/free <5 minute intervals for
all guests - do you think this is possible using virt-df?

Thanks!
Dan Ryder

-----Original Message-----
From: Richard W.M. Jones [mailto:rjones@redhat.com] 
Sent: Wednesday, September 10, 2014 12:32 PM
To: Dan Ryder (daryder)
Cc: libguestfs@redhat.com
Subject: Re: [Libguestfs] Scaling virt-df performance

On Wed, Sep 10, 2014 at 01:38:16PM +0000, Dan Ryder (daryder)
wrote:> Hello,
>
> I have been looking at the "virt-df" libguestfs tool to get 
> guest-level disk used/free statistics - specifically with 
> Qemu-KVM/Openstack. This works great for a few Openstack instances, 
> but when I begin to scale (even to ~30 instances/guests) the 
> performance really takes a hit. The time it takes for the command to 
> complete seems to scale linearly with the amount of guests/domains 
> running on the hypervisor (note - I am using "virt-df" for all
guests,
> not specifying one at a time; although I've tried that, too).
>
> For ~30 guests, the "virt-df" command takes around 90 seconds to 
> complete. We are looking to support a scale of 3,000-30,000 guests 
> disk used/free. It looks like this won't be remotely possible using 
> "virt-df".
With sufficient memory, non-nested, on hardware built in the last 3 years, you
should get performance of about 1 second / guest (pipelined).  So the figure you
give is about 3 times too high than where it should be.

Just to get some basic things out of the way:

- What version of virt-df are you using and on what distro?

- Is this nested?

- What exact virt-df command(s) are you running?

- Are you using the -P option?

- How much free memory is on the system?  virt-df runs multiple
  threads in parallel, but the number to run is computed according to
  the amount of free memory[1].

Also have a look at the guestfs-performance manual[2].  I'm especially
interested in the results of the baseline measurements on that page, but the
rest of the page should answer some of your questions too.
There's also an interesting Perl script you might try playing with.

Rich.

[1]
https://github.com/libguestfs/libguestfs/blob/master/df/estimate-max-threads.c
[2] http://libguestfs.org/guestfs-performance.1.html

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora
Windows cross-compiler. Compile Windows programs, test, and build Windows
installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Maybe Matching Threads

Search for more possibly parallel threads

Libguestfs - Sep 2014 - Scaling virt-df performance

[Libguestfs] Scaling virt-df performance

Re: [Libguestfs] Scaling virt-df performance

Re: [Libguestfs] Scaling virt-df performance

Maybe Matching Threads