thr3ads.net - CentOS - [CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6? [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Peter van Hooft

2015-Apr-30 12:24 UTC

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

> Message: 4
> Date: Wed, 29 Apr 2015 08:35:29 -0500
> From: Matt Garman <matthew.garman at gmail.com>
> To: CentOS mailing list <centos at centos.org>
> Subject: [CentOS] nfs (or tcp or scheduler) changes between centos 5
> 	and 6?
> Message-ID:
> 	<CAJvUf-CyTg8ZiGq3OXRLKw7s1K2dGx1gqo_2XwOAXXQty=RHZQ at
mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> We have a "compute cluster" of about 100 machines that do a
read-only
> NFS mount to a big NAS filer (a NetApp FAS6280).  The jobs running on
> these boxes are analysis/simulation jobs that constantly read data off
> the NAS.
> 
> We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5.
> We did a "piecemeal" upgrade, usually upgrading five or so
machines at
> a time, every few days.  We noticed improved performance on the CentOS
> 6 boxes.  But as the number of CentOS 6 boxes increased, we actually
> saw performance on the CentOS 5 boxes decrease.  By the time we had
> only a few CentOS 5 boxes left, they were performing so badly as to be
> effectively worthless.
> 
> What we observed in parallel to this upgrade process was that the read
> latency on our NetApp device skyrocketed.  This in turn caused all
> compute jobs to actually run slower, as it seemed to move the
> bottleneck from the client servers' OS to the NetApp.  This is
> somewhat counter-intuitive: CentOS 6 performs faster, but actually
> results in net performance loss because it creates a bottleneck on our
> centralized storage.
> 
> All indications are that CentOS 6 seems to be much more
"aggressive"
> in how it does NFS reads.  And likewise, CentOS 5 was very
"polite",
> to the point that it basically got starved out by the introduction of
> the 6.5 boxes.
> 
> What I'm looking for is a "deep dive" list of changes to the
NFS
> implementation between CentOS 5 and CentOS 6.  Or maybe this is due to
> a change in the TCP stack?  Or maybe the scheduler?  We've tried a lot
> of sysctl tcp tunings, various nfs mount options, anything that's
> obviously different between 5 and 6... But so far we've been unable to
> find the "smoking gun" that causes the obvious behavior change
between
> the two OS versions.
> 
> Just hoping that maybe someone else out there has seen something like
> this, or can point me to some detailed documentation that might clue
> me in on what to look for next.
> 
> Thanks!
> 

You may want to try reducing sunrpc.tcp_max_slot_table_entries .
In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16
In CentOS 6, this number is dynamic with a maximum of
sunrpc.tcp_max_slot_table_entries which by default has a value of 65536.

We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc
tcp_max_slot_table_entries=128

You can't put this in /etc/sysctl.conf because the sunrpc kernel module
is loaded before sysctl -p is done.

peter

Peter van Hooft

2015-Apr-30 12:31 UTC

head link

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

On Thu, Apr 30, 2015 at 02:24:27PM +0200, Peter van Hooft
wrote:> > Message: 4
> > Date: Wed, 29 Apr 2015 08:35:29 -0500
> > From: Matt Garman <matthew.garman at gmail.com>
> > To: CentOS mailing list <centos at centos.org>
> > Subject: [CentOS] nfs (or tcp or scheduler) changes between centos 5
> > 	and 6?
> > Message-ID:
> > 	<CAJvUf-CyTg8ZiGq3OXRLKw7s1K2dGx1gqo_2XwOAXXQty=RHZQ at
mail.gmail.com>
> > Content-Type: text/plain; charset=UTF-8
> > 
> > We have a "compute cluster" of about 100 machines that do a
read-only
> > NFS mount to a big NAS filer (a NetApp FAS6280).  The jobs running on
> > these boxes are analysis/simulation jobs that constantly read data off
> > the NAS.
> > 
> > We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5.
> > We did a "piecemeal" upgrade, usually upgrading five or so
machines at
> > a time, every few days.  We noticed improved performance on the CentOS
> > 6 boxes.  But as the number of CentOS 6 boxes increased, we actually
> > saw performance on the CentOS 5 boxes decrease.  By the time we had
> > only a few CentOS 5 boxes left, they were performing so badly as to be
> > effectively worthless.
> > 
> > What we observed in parallel to this upgrade process was that the read
> > latency on our NetApp device skyrocketed.  This in turn caused all
> > compute jobs to actually run slower, as it seemed to move the
> > bottleneck from the client servers' OS to the NetApp.  This is
> > somewhat counter-intuitive: CentOS 6 performs faster, but actually
> > results in net performance loss because it creates a bottleneck on our
> > centralized storage.
> > 
> > All indications are that CentOS 6 seems to be much more
"aggressive"
> > in how it does NFS reads.  And likewise, CentOS 5 was very
"polite",
> > to the point that it basically got starved out by the introduction of
> > the 6.5 boxes.
> > 
> > What I'm looking for is a "deep dive" list of changes to
the NFS
> > implementation between CentOS 5 and CentOS 6.  Or maybe this is due to
> > a change in the TCP stack?  Or maybe the scheduler?  We've tried a
lot
> > of sysctl tcp tunings, various nfs mount options, anything that's
> > obviously different between 5 and 6... But so far we've been
unable to
> > find the "smoking gun" that causes the obvious behavior
change between
> > the two OS versions.
> > 
> > Just hoping that maybe someone else out there has seen something like
> > this, or can point me to some detailed documentation that might clue
> > me in on what to look for next.
> > 
> > Thanks!
> > 
> 
> 
> You may want to try reducing sunrpc.tcp_max_slot_table_entries .
> In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries =
16
> In CentOS 6, this number is dynamic with a maximum of
> sunrpc.tcp_max_slot_table_entries which by default has a value of 65536.
> 
> We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc
> tcp_max_slot_table_entries=128
Make that /etc/modprobe.d/sunrpc.conf, of course.

peter

Matt Garman

2015-May-04 16:58 UTC

head link

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

On Thu, Apr 30, 2015 at 7:31 AM, Peter van Hooft
<hooft at natlab.research.philips.com> wrote:>> You may want to try reducing sunrpc.tcp_max_slot_table_entries .
>> In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries
= 16
>> In CentOS 6, this number is dynamic with a maximum of
>> sunrpc.tcp_max_slot_table_entries which by default has a value of
65536.
>>
>> We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc
>> tcp_max_slot_table_entries=128
>
> Make that /etc/modprobe.d/sunrpc.conf, of course.

This appears to be the "smoking gun" we were looking for, or at least
a significant piece of the puzzle.

We actually tried this early on in our investigation, but were
changing it via sysctl, which apparently has no effect.  Your email
convinced me to try again, but this time configuring the parameters
via modprobe.

In our case, 128 was still too high.  So we dropped it all the way
down to 16.  Our understanding is that 16 is the CentOS 5 value.  What
we're seeing is now our apps are starved for data, so looks like we
might have to nudge it up.  In other words, there's either something
else at play which we're not aware of, or the meaning of that
parameter is different between CentOS 5 and CentOS 6.

Anyway, thank you very much for the suggestion.  You turned on the
light at the end of the tunnel!

Apparently Analagous Threads

Search for more maybe matching threads

CentOS - Apr 2015 - nfs (or tcp or scheduler) changes between centos 5 and 6?

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

Apparently Analagous Threads