Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit?:>> Le 12/05/2020 ? 16:10, James Pearson a ?crit?: >>> Patrick B?gou wrote: >>>> Hi, >>>> >>>> I need some help with NFSv4 setup/tuning. I have a dedicated nfs server >>>> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and 16x >>>> 8TB HDD) used by two servers and a small cluster (400 cores). All the >>>> servers are running CentOS 7, the cluster is running CentOS6. >>>> >>>> Time to time on the server I get: >>>> >>>> ????? kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with >>>> ???? incorrect client ID >>>> >>>> And the client xxx.xxx.xxx.xxx freeze whith: >>>> >>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>> ???? still trying >>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>> ???? still trying >>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>> >>>> There is a discussion on RedHat7 support about this but only open to >>>> subscribers. Other searches with google do not provide? useful >>>> information. >>>> >>>> Do you have an idea how to solve these freeze states ? >>>> >>>> More generally I would be really interested with some advice/tutorials >>>> to improve NFS performances in this dedicated context. There are so >>>> many >>>> [different] things about tuning NFS available on the web that I'm a >>>> little bit lost (the opposite of the previous question). So if some one >>>> has "the tutorial"...;-) >>> How many nfsd threads are you running on the server? - current count >>> will be in /proc/fs/nfsd/threads >>> >>> James Pearson >> Hi James, >> >> Thanks for your answer. I've configured 24 threads (for 16 hardware >> cores/ 32Threads on the NFS server with this processors) >> >> But it seams that there are buffer setup to modify too when increasing >> the threads number... It is not done. >> >> Load average on the server is below 1.... > I'd be very careful with higher thread numbers than physical cores. NFS > threads and so called CPU hyper/simultaneous threads are quite different > things and it can hurt performance if not configured correctly. >So you suggest to limit the setup to 16 daemons ? I'll try this evening. Patrick
Le 13/05/2020 ? 15:36, Patrick B?gou a ?crit?:> Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit?: >>> Le 12/05/2020 ? 16:10, James Pearson a ?crit?: >>>> Patrick B?gou wrote: >>>>> Hi, >>>>> >>>>> I need some help with NFSv4 setup/tuning. I have a dedicated nfs server >>>>> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and 16x >>>>> 8TB HDD) used by two servers and a small cluster (400 cores). All the >>>>> servers are running CentOS 7, the cluster is running CentOS6. >>>>> >>>>> Time to time on the server I get: >>>>> >>>>> ????? kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with >>>>> ???? incorrect client ID >>>>> >>>>> And the client xxx.xxx.xxx.xxx freeze whith: >>>>> >>>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>>> ???? still trying >>>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>>> ???? still trying >>>>> ????? kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>> >>>>> There is a discussion on RedHat7 support about this but only open to >>>>> subscribers. Other searches with google do not provide? useful >>>>> information. >>>>> >>>>> Do you have an idea how to solve these freeze states ? >>>>> >>>>> More generally I would be really interested with some advice/tutorials >>>>> to improve NFS performances in this dedicated context. There are so >>>>> many >>>>> [different] things about tuning NFS available on the web that I'm a >>>>> little bit lost (the opposite of the previous question). So if some one >>>>> has "the tutorial"...;-) >>>> How many nfsd threads are you running on the server? - current count >>>> will be in /proc/fs/nfsd/threads >>>> >>>> James Pearson >>> Hi James, >>> >>> Thanks for your answer. I've configured 24 threads (for 16 hardware >>> cores/ 32Threads on the NFS server with this processors) >>> >>> But it seams that there are buffer setup to modify too when increasing >>> the threads number... It is not done. >>> >>> Load average on the server is below 1.... >> I'd be very careful with higher thread numbers than physical cores. NFS >> threads and so called CPU hyper/simultaneous threads are quite different >> things and it can hurt performance if not configured correctly. >> > So you suggest to limit the setup to 16 daemons ? I'll try this evening. >Setting 16 daemons (the number of physical cores) do not solve this problem. Moreover I saw a document (but old) provided by DELL to optimize NFS servers performances in HPC context and they suggest to use... 128 daemons on a dedicated poweredge server. :-\ I saw that it is always the same client showing the problem (a large fat node), may be I must investigate on the client side more than on the serveur side. Patrick
The number of threads has nothing to do with the number of cores on the machine. It depends on the I/O, network speed, type of workload etc. We usually start with 32 threads and increase if necessary. You can check the statistics with: watch 'cat /proc/net/rpc/nfsd | grep th? Or you can check on the client nfsstat -rc Client rpc stats: calls retrans authrefrsh 1326777974 0 1326645701 If you see a large number of retransmissions, you should increase the number of threads. However, your problem could also be related to the filesystem or network. Do you have jumbo frames (if yes, you should have them on clients and server)? You might think about disabling flow control on the switch and on the network card. Are there a lot of dropped packets? For network tuning, check http://fasterdata.es.net/host-tuning/linux/ Did you try to enable readahead (blockdev ?setra) on the filesystem? On the client side, changing the mount options helps. The default read/write block size is quite little, increase it (rsize, wsize), and use noatime. Cheers, Barbara> On 15 May 2020, at 09:26, Patrick B?gou <Patrick.Begou at legi.grenoble-inp.fr> wrote: > > Le 13/05/2020 ? 15:36, Patrick B?gou a ?crit : >> Le 13/05/2020 ? 07:32, Simon Matter via CentOS a ?crit : >>>> Le 12/05/2020 ? 16:10, James Pearson a ?crit : >>>>> Patrick B?gou wrote: >>>>>> Hi, >>>>>> >>>>>> I need some help with NFSv4 setup/tuning. I have a dedicated nfs server >>>>>> (2 x E5-2620 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and 16x >>>>>> 8TB HDD) used by two servers and a small cluster (400 cores). All the >>>>>> servers are running CentOS 7, the cluster is running CentOS6. >>>>>> >>>>>> Time to time on the server I get: >>>>>> >>>>>> kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with >>>>>> incorrect client ID >>>>>> >>>>>> And the client xxx.xxx.xxx.xxx freeze whith: >>>>>> >>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>>>> still trying >>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding, >>>>>> still trying >>>>>> kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK >>>>>> >>>>>> There is a discussion on RedHat7 support about this but only open to >>>>>> subscribers. Other searches with google do not provide useful >>>>>> information. >>>>>> >>>>>> Do you have an idea how to solve these freeze states ? >>>>>> >>>>>> More generally I would be really interested with some advice/tutorials >>>>>> to improve NFS performances in this dedicated context. There are so >>>>>> many >>>>>> [different] things about tuning NFS available on the web that I'm a >>>>>> little bit lost (the opposite of the previous question). So if some one >>>>>> has "the tutorial"...;-) >>>>> How many nfsd threads are you running on the server? - current count >>>>> will be in /proc/fs/nfsd/threads >>>>> >>>>> James Pearson >>>> Hi James, >>>> >>>> Thanks for your answer. I've configured 24 threads (for 16 hardware >>>> cores/ 32Threads on the NFS server with this processors) >>>> >>>> But it seams that there are buffer setup to modify too when increasing >>>> the threads number... It is not done. >>>> >>>> Load average on the server is below 1.... >>> I'd be very careful with higher thread numbers than physical cores. NFS >>> threads and so called CPU hyper/simultaneous threads are quite different >>> things and it can hurt performance if not configured correctly. >>> >> So you suggest to limit the setup to 16 daemons ? I'll try this evening. >> > Setting 16 daemons (the number of physical cores) do not solve this > problem. Moreover I saw a document (but old) provided by DELL to > optimize NFS servers performances in HPC context and they suggest to > use... 128 daemons on a dedicated poweredge server. :-\ > > I saw that it is always the same client showing the problem (a large fat > node), may be I must investigate on the client side more than on the > serveur side. > > Patrick > > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org <mailto:CentOS at centos.org> > https://lists.centos.org/mailman/listinfo/centos <https://lists.centos.org/mailman/listinfo/centos>