thr3ads.net - CentOS - [CentOS] CentOS7 and NFS [May 2020]

If this information is useful, please help other people find it:
Share via:

Barbara Krašovec

2020-May-15 13:32 UTC

[CentOS] CentOS7 and NFS

The number of threads has nothing to do with the number of cores on the machine.
It depends on the I/O, network speed, type of workload etc.
We usually start with 32 threads and increase if necessary. 

You can check the statistics with:
watch 'cat /proc/net/rpc/nfsd | grep th?

Or you can check on the client

nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
1326777974   0          1326645701

If you see a large number of retransmissions, you should increase the number of
threads.

However, your problem could also be related to the filesystem or network.

Do you have jumbo frames (if yes, you should have them on clients and server)?
You might think about disabling flow control on the switch and on the network
card. Are there a lot of dropped packets?

For network tuning, check http://fasterdata.es.net/host-tuning/linux/

Did you try to enable readahead (blockdev ?setra) on the filesystem?

On the client side, changing the mount options helps. The default read/write
block size is quite little, increase it (rsize, wsize), and use noatime.


Cheers,
Barbara




> On 15 May 2020, at 09:26, Patrick B?gou <Patrick.Begou at
legi.grenoble-inp.fr> wrote:
> 
> Le 13/05/2020 ? 15:36, Patrick B?gou a ?crit :
>> Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit :
>>>> Le 12/05/2020 ? 16:10, James Pearson a ?crit :
>>>>> Patrick B?gou wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I need some help with NFSv4 setup/tuning. I have a
dedicated nfs server
>>>>>> (2 x E5-2620  8cores/16 threads each, 64GB RAM, 1x10Gb
ethernet and 16x
>>>>>> 8TB HDD) used by two servers and a small cluster (400
cores). All the
>>>>>> servers are running CentOS 7, the cluster is running
CentOS6.
>>>>>> 
>>>>>> Time to time on the server I get:
>>>>>> 
>>>>>>       kernel: NFSD: client xxx.xxx.xxx.xxx testing
state ID with
>>>>>>      incorrect client ID
>>>>>> 
>>>>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>>>> 
>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr
not responding,
>>>>>>      still trying
>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr
not responding,
>>>>>>      still trying
>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>>>> 
>>>>>> There is a discussion on RedHat7 support about this but
only open to
>>>>>> subscribers. Other searches with google do not provide 
useful
>>>>>> information.
>>>>>> 
>>>>>> Do you have an idea how to solve these freeze states ?
>>>>>> 
>>>>>> More generally I would be really interested with some
advice/tutorials
>>>>>> to improve NFS performances in this dedicated context.
There are so
>>>>>> many
>>>>>> [different] things about tuning NFS available on the
web that I'm a
>>>>>> little bit lost (the opposite of the previous
question). So if some one
>>>>>> has "the tutorial"...;-)
>>>>> How many nfsd threads are you running on the server? -
current count
>>>>> will be in /proc/fs/nfsd/threads
>>>>> 
>>>>> James Pearson
>>>> Hi James,
>>>> 
>>>> Thanks for your answer. I've configured 24 threads (for 16
hardware
>>>> cores/ 32Threads on the NFS server with this processors)
>>>> 
>>>> But it seams that there are buffer setup to modify too when
increasing
>>>> the threads number... It is not done.
>>>> 
>>>> Load average on the server is below 1....
>>> I'd be very careful with higher thread numbers than physical
cores. NFS
>>> threads and so called CPU hyper/simultaneous threads are quite
different
>>> things and it can hurt performance if not configured correctly.
>>> 
>> So you suggest to limit the setup to 16 daemons ? I'll try this
evening.
>> 
> Setting 16 daemons (the number of physical cores) do not solve this
> problem. Moreover I saw a document (but old) provided by DELL to
> optimize NFS servers performances in HPC context and they suggest to
> use... 128 daemons on a dedicated poweredge server. :-\
> 
> I saw that it is always the same client showing the problem (a large fat
> node), may be I must investigate on the client side more than on the
> serveur side.
> 
> Patrick
> 
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org <mailto:CentOS at centos.org>
> https://lists.centos.org/mailman/listinfo/centos
<https://lists.centos.org/mailman/listinfo/centos>

Patrick Bégou

2020-May-16 09:41 UTC

head link

[CentOS] CentOS7 and NFS

Hi Barbara,

Thanks for all these suggestions. Yes, jumbo frames are activated and I
have only two 10Gb ethernet switch between the server and the client,
connected with a monomode fiber.
I saw yesterday that the client showing the problem had not the right
MTU (1500 instead of 9000). I don't know why. I changed the MTU to 9000
yesterday and I'm looking at the logs now to see if the problems occur
again.

I will try to increase the number of nfs daemon in a few day, to check
each setup change one after the other. Because of covid19, I'm working
from home so I should be really careful when changing the setup of the
servers.

On a cluster node I try to set
"rsize=1048576,wsize=1048576,vers=4,tcp"
(I cannot have a larger value for rsize/wsize) but comparison with the
mount using default setup do not show significant improvements. I sent
20GB to the server or 2x10GB (2 concurrent processes) with dd to be
larger than the raid controller cache but lower than the? server and
client RAM. It was just a short test this morning.

Patrick

Le 15/05/2020 ? 15:32, Barbara Kra?ovec a ?crit?:> The number of threads has nothing to do with the number of cores on the
machine. It depends on the I/O, network speed, type of workload etc.
> We usually start with 32 threads and increase if necessary. 
>
> You can check the statistics with:
> watch 'cat /proc/net/rpc/nfsd | grep th?
>
> Or you can check on the client
> bide5.bin 
> nfsstat -rc
> Client rpc stats:
> calls      retrans    authrefrsh
> 1326777974   0          1326645701
>
> If you see a large number of retransmissions, you should increase the
number of threads.
>
> However, your problem could also be related to the filesystem or network.
>
> Do you have jumbo frames (if yes, you should have them on clients and
server)? You might think about disabling flow control on the switch and on the
network card. Are there a lot of dropped packets?
>
> For network tuning, check http://fasterdata.es.net/host-tuning/linux/
>
> Did you try to enable readahead (blockdev ?setra) on the filesystem?
>
> On the client side, changing the mount options helps. The default
read/write block size is quite little, increase it (rsize, wsize), and use
noatime.
>
>
> Cheers,
> Barbara
>
>
>
>
>
>> On 15 May 2020, at 09:26, Patrick B?gou <Patrick.Begou at
legi.grenoble-inp.fr> wrote:
>>
>> Le 13/05/2020 ? 15:36, Patrick B?gou a ?crit :
>>> Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit :
>>>>> Le 12/05/2020 ? 16:10, James Pearson a ?crit :
>>>>>> Patrick B?gou wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I need some help with NFSv4 setup/tuning. I have a
dedicated nfs server
>>>>>>> (2 x E5-2620  8cores/16 threads each, 64GB RAM,
1x10Gb ethernet and 16x
>>>>>>> 8TB HDD) used by two servers and a small cluster
(400 cores). All the
>>>>>>> servers are running CentOS 7, the cluster is
running CentOS6.
>>>>>>>
>>>>>>> Time to time on the server I get:
>>>>>>>
>>>>>>>       kernel: NFSD: client xxx.xxx.xxx.xxx testing
state ID with
>>>>>>>      incorrect client ID
>>>>>>>
>>>>>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>>>>>
>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr not responding,
>>>>>>>      still trying
>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr OK
>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr not responding,
>>>>>>>      still trying
>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr OK
>>>>>>>
>>>>>>> There is a discussion on RedHat7 support about this
but only open to
>>>>>>> subscribers. Other searches with google do not
provide  useful
>>>>>>> information.
>>>>>>>
>>>>>>> Do you have an idea how to solve these freeze
states ?
>>>>>>>
>>>>>>> More generally I would be really interested with
some advice/tutorials
>>>>>>> to improve NFS performances in this dedicated
context. There are so
>>>>>>> many
>>>>>>> [different] things about tuning NFS available on
the web that I'm a
>>>>>>> little bit lost (the opposite of the previous
question). So if some one
>>>>>>> has "the tutorial"...;-)
>>>>>> How many nfsd threads are you running on the server? -
current count
>>>>>> will be in /proc/fs/nfsd/threads
>>>>>>
>>>>>> James Pearson
>>>>> Hi James,
>>>>>
>>>>> Thanks for your answer. I've configured 24 threads (for
16 hardware
>>>>> cores/ 32Threads on the NFS server with this processors)
>>>>>
>>>>> But it seams that there are buffer setup to modify too when
increasing
>>>>> the threads number... It is not done.
>>>>>
>>>>> Load average on the server is below 1....
>>>> I'd be very careful with higher thread numbers than
physical cores. NFS
>>>> threads and so called CPU hyper/simultaneous threads are quite
different
>>>> things and it can hurt performance if not configured correctly.
>>>>
>>> So you suggest to limit the setup to 16 daemons ? I'll try this
evening.
>>>
>> Setting 16 daemons (the number of physical cores) do not solve this
>> problem. Moreover I saw a document (but old) provided by DELL to
>> optimize NFS servers performances in HPC context and they suggest to
>> use... 128 daemons on a dedicated poweredge server. :-\
>>
>> I saw that it is always the same client showing the problem (a large
fat
>> node), may be I must investigate on the client side more than on the
>> serveur side.
>>
>> Patrick
>>
>>
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org <mailto:CentOS at centos.org>
>> https://lists.centos.org/mailman/listinfo/centos
<https://lists.centos.org/mailman/listinfo/centos>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos

Strahil Nikolov

2020-May-16 15:39 UTC

head link

[CentOS] CentOS7 and NFS

On May 16, 2020 12:41:09 PM GMT+03:00, "Patrick B?gou"
<Patrick.Begou at legi.grenoble-inp.fr> wrote:>Hi Barbara,
>
>Thanks for all these suggestions. Yes, jumbo frames are activated and I
>have only two 10Gb ethernet switch between the server and the client,
>connected with a monomode fiber.
>I saw yesterday that the client showing the problem had not the right
>MTU (1500 instead of 9000). I don't know why. I changed the MTU to 9000
>yesterday and I'm looking at the logs now to see if the problems occur
>again.
>
>I will try to increase the number of nfs daemon in a few day, to check
>each setup change one after the other. Because of covid19, I'm working
>from home so I should be really careful when changing the setup of the
>servers.
>
>On a cluster node I try to set
"rsize=1048576,wsize=1048576,vers=4,tcp"
>(I cannot have a larger value for rsize/wsize) but comparison with the
>mount using default setup do not show significant improvements. I sent
>20GB to the server or 2x10GB (2 concurrent processes) with dd to be
>larger than the raid controller cache but lower than the? server and
>client RAM. It was just a short test this morning.
>
>Patrick
>
>Le 15/05/2020 ? 15:32, Barbara Kra?ovec a ?crit?:
>> The number of threads has nothing to do with the number of cores on
>the machine. It depends on the I/O, network speed, type of workload
>etc.
>> We usually start with 32 threads and increase if necessary. 
>>
>> You can check the statistics with:
>> watch 'cat /proc/net/rpc/nfsd | grep th?
>>
>> Or you can check on the client
>> bide5.bin 
>> nfsstat -rc
>> Client rpc stats:
>> calls      retrans    authrefrsh
>> 1326777974   0          1326645701
>>
>> If you see a large number of retransmissions, you should increase the
>number of threads.
>>
>> However, your problem could also be related to the filesystem or
>network.
>>
>> Do you have jumbo frames (if yes, you should have them on clients and
>server)? You might think about disabling flow control on the switch and
>on the network card. Are there a lot of dropped packets?
>>
>> For network tuning, check http://fasterdata.es.net/host-tuning/linux/
>>
>> Did you try to enable readahead (blockdev ?setra) on the filesystem?
>>
>> On the client side, changing the mount options helps. The default
>read/write block size is quite little, increase it (rsize, wsize), and
>use noatime.
>>
>>
>> Cheers,
>> Barbara
>>
>>
>>
>>
>>
>>> On 15 May 2020, at 09:26, Patrick B?gou
><Patrick.Begou at legi.grenoble-inp.fr> wrote:
>>>
>>> Le 13/05/2020 ? 15:36, Patrick B?gou a ?crit :
>>>> Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit :
>>>>>> Le 12/05/2020 ? 16:10, James Pearson a ?crit :
>>>>>>> Patrick B?gou wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I need some help with NFSv4 setup/tuning. I
have a dedicated
>nfs server
>>>>>>>> (2 x E5-2620  8cores/16 threads each, 64GB RAM,
1x10Gb ethernet
>and 16x
>>>>>>>> 8TB HDD) used by two servers and a small
cluster (400 cores).
>All the
>>>>>>>> servers are running CentOS 7, the cluster is
running CentOS6.
>>>>>>>>
>>>>>>>> Time to time on the server I get:
>>>>>>>>
>>>>>>>>       kernel: NFSD: client xxx.xxx.xxx.xxx
testing state ID
>with
>>>>>>>>      incorrect client ID
>>>>>>>>
>>>>>>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>>>>>>
>>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr not
>responding,
>>>>>>>>      still trying
>>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr OK
>>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr not
>responding,
>>>>>>>>      still trying
>>>>>>>>       kernel: nfs: server
xxxxx.legi.grenoble-inp.fr OK
>>>>>>>>
>>>>>>>> There is a discussion on RedHat7 support about
this but only
>open to
>>>>>>>> subscribers. Other searches with google do not
provide  useful
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Do you have an idea how to solve these freeze
states ?
>>>>>>>>
>>>>>>>> More generally I would be really interested
with some
>advice/tutorials
>>>>>>>> to improve NFS performances in this dedicated
context. There
>are so
>>>>>>>> many
>>>>>>>> [different] things about tuning NFS available
on the web that
>I'm a
>>>>>>>> little bit lost (the opposite of the previous
question). So if
>some one
>>>>>>>> has "the tutorial"...;-)
>>>>>>> How many nfsd threads are you running on the
server? - current
>count
>>>>>>> will be in /proc/fs/nfsd/threads
>>>>>>>
>>>>>>> James Pearson
>>>>>> Hi James,
>>>>>>
>>>>>> Thanks for your answer. I've configured 24 threads
(for 16
>hardware
>>>>>> cores/ 32Threads on the NFS server with this
processors)
>>>>>>
>>>>>> But it seams that there are buffer setup to modify too
when
>increasing
>>>>>> the threads number... It is not done.
>>>>>>
>>>>>> Load average on the server is below 1....
>>>>> I'd be very careful with higher thread numbers than
physical
>cores. NFS
>>>>> threads and so called CPU hyper/simultaneous threads are
quite
>different
>>>>> things and it can hurt performance if not configured
correctly.
>>>>>
>>>> So you suggest to limit the setup to 16 daemons ? I'll try
this
>evening.
>>>>
>>> Setting 16 daemons (the number of physical cores) do not solve this
>>> problem. Moreover I saw a document (but old) provided by DELL to
>>> optimize NFS servers performances in HPC context and they suggest
to
>>> use... 128 daemons on a dedicated poweredge server. :-\
>>>
>>> I saw that it is always the same client showing the problem (a
large
>fat
>>> node), may be I must investigate on the client side more than on
the
>>> serveur side.
>>>
>>> Patrick
>>>
>>>
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org <mailto:CentOS at centos.org>
>>> https://lists.centos.org/mailman/listinfo/centos
><https://lists.centos.org/mailman/listinfo/centos>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>
>
>_______________________________________________
>CentOS mailing list
>CentOS at centos.org
>https://lists.centos.org/mailman/listinfo/centos
Hi ,
Why don't you leave the client negotiate the version itself ?
pNFS requires  at minimum - v4.1 and can bring extra performance.

P.S.: According to the man pages 'vers' is :
'is  an  alternative  to   the nfsvers option.  It is included for
compatibility with other operating systems.'
I was  always using 'nfsvers' :).

Best Regards,
Strahil Nikolov

Apparently Analagous Threads

Search for more maybe matching threads

CentOS - May 2020 - CentOS7 and NFS

[CentOS] CentOS7 and NFS

[CentOS] CentOS7 and NFS

[CentOS] CentOS7 and NFS

Apparently Analagous Threads