thr3ads.net - libvirt users - Re: [libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Blair Bethwaite

2017-Nov-14 17:23 UTC

[libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Hi all,

This is not really a libvirt issue but I'm hoping some of the smart folks
here will know more about this problem...

We have noticed when running some HPC applications on our OpenStack
(libvirt+KVM) cloud that the same application occasionally performs much
worse (4-5x slowdown) than normal. We can reproduce this quite easily by
filling pagecache (i.e. dd-ing a single large file to /dev/null) before
running the application. The problem seems to be that the kernel is not
freeing (or has some trouble freeing) the non-dirty and (presumably
immediately) reclaimable pagecache in order to allocate THPs for the
application.

This behaviour is also observable on regular bare-metal, but the slowdown
is only 10-20% there - the nested paging of the guest really makes THP
allocation important there (1). Both current CentOS and Ubuntu guests have
the issue. Some more environmental context: the VMs are tuned (CPU pinning,
NUMA topology, hugepage backing), so we normally see no difference between
host and guest performance.

After talking to many other colleagues in the HPC space it seems common
that people setup their batch schedulers to put drop_caches between jobs,
which can help to workaround this and other similar issues, but no-one has
been able to explain why this is happening. Hopefully someone who knows
about the guts of the Linux MM can shed some light...?

(1) a related and possibly dumb question: in the case of a high-performance
KVM where the guest is hugepage backed and pinned in host memory anyway,
why do we still have a table based resolution for guest physical to host
virtual address translation - couldn't this just be done by offset?

-- 
Cheers,
~Blairo

Daniel P. Berrange

2017-Nov-14 17:32 UTC

head link

Re: [libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

On Tue, Nov 14, 2017 at 10:23:56AM -0700, Blair Bethwaite
wrote:> Hi all,
> 
> This is not really a libvirt issue but I'm hoping some of the smart
folks
> here will know more about this problem...
> 
> We have noticed when running some HPC applications on our OpenStack
> (libvirt+KVM) cloud that the same application occasionally performs much
> worse (4-5x slowdown) than normal. We can reproduce this quite easily by
> filling pagecache (i.e. dd-ing a single large file to /dev/null) before
> running the application. The problem seems to be that the kernel is not
> freeing (or has some trouble freeing) the non-dirty and (presumably
> immediately) reclaimable pagecache in order to allocate THPs for the
> application.
> 
> This behaviour is also observable on regular bare-metal, but the slowdown
> is only 10-20% there - the nested paging of the guest really makes THP
> allocation important there (1). Both current CentOS and Ubuntu guests have
> the issue. Some more environmental context: the VMs are tuned (CPU pinning,
> NUMA topology, hugepage backing), so we normally see no difference between
> host and guest performance.
> 
> After talking to many other colleagues in the HPC space it seems common
> that people setup their batch schedulers to put drop_caches between jobs,
> which can help to workaround this and other similar issues, but no-one has
> been able to explain why this is happening. Hopefully someone who knows
> about the guts of the Linux MM can shed some light...?
Transparent huge pages were never intended to provide any guarantees of
performance for applications. They are essentially an optimization so that if
there is free RAM, huge pages can be opportunistically given to a process even
if it didn't ask for them. This gives KVM some improved performance in the
default "out of the box" scenario where the user hasn't taken time
to manually
optimize setting.

The kernel can't predict the future usage pattern of processes so it is not
at
all clear cut that evicting the entire pagecache in order to allocate more
huge pages is going to be beneficial for system performance as a whole.

IOW, if your application has a certain expectation of performance that can only
be satisfied by having the KVM guest backed by huge pages, then you should
really change to explicitly reserve huge pages for the guests, and not rely on
THP which inherantly can't provide any guarantee in this area.
> (1) a related and possibly dumb question: in the case of a high-performance
> KVM where the guest is hugepage backed and pinned in host memory anyway,
> why do we still have a table based resolution for guest physical to host
> virtual address translation - couldn't this just be done by offset?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Blair Bethwaite

2017-Nov-14 17:52 UTC

head link

Re: [libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Thanks for the reply Daniel,

However I think you slightly misunderstood the scenario...

On 14 November 2017 at 10:32, Daniel P. Berrange <berrange@redhat.com>
wrote:> IOW, if your application has a certain expectation of performance that can
only
> be satisfied by having the KVM guest backed by huge pages, then you should
> really change to explicitly reserve huge pages for the guests, and not rely
on
> THP which inherantly can't provide any guarantee in this area.
We already do this. The problem is not hugepage backing of the guest,
it is THP allocation inside the guest (or indeed on a bare-metal
host). The issue in the HPC world is that we support so many different
applications (some of which are complete black-boxes) that explicit
hugepage allocation for application memory is generally not viable, so
we are reliant on THP to avoid TLB thrashing.
> The kernel can't predict the future usage pattern of processes so it is
not at
> all clear cut that evicting the entire pagecache in order to allocate more
> huge pages is going to be beneficial for system performance as a whole.
Yet the default behaviour seems to be to stall on fault, then directly
reclaim and defrag in order to allocate a hugepage if at all possible.
In my test-case there is almost no free memory, so some pagecache has
to be reclaimed for the process, I don't understand why the THP
allocation fails in this case versus when pagecache is lower though.

-- 
Cheers,
~Blairo

Possibly Parallel Threads

Search for more possibly parallel threads

libvirt users - Nov 2017 - Re: dramatic performance slowdown due to THP allocation failure with full pagecache

[libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Re: [libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Re: [libvirt-users] dramatic performance slowdown due to THP allocation failure with full pagecache

Possibly Parallel Threads