thr3ads.net - CentOS - [CentOS] CentOS7 and NFS [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Patrick Bégou

2020-Jun-01 09:08 UTC

[CentOS] CentOS7 and NFS

Le 13/05/2020 ? 02:13, Orion Poplawski a ?crit?:> On 5/12/20 2:46 AM, Patrick B?gou wrote:
>> Hi,
>>
>> I need some help with NFSv4 setup/tuning. I have a dedicated nfs server
>> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and 16x
>> 8TB HDD) used by two servers and a small cluster (400 cores). All the
>> servers are running CentOS 7, the cluster is running CentOS6.
>>
>> Time to time on the server I get:
>>
>> ???? ?kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with
>> ???? incorrect client ID
>>
>> And the client xxx.xxx.xxx.xxx freeze whith:
>>
>> ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding,
>> ???? still trying
>> ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>> ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding,
>> ???? still trying
>> ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>
>> There is a discussion on RedHat7 support about this but only open to
>> subscribers. Other searches with google do not provide? useful
>> information.
>
> FYI - you can get access to such info with a free RHEL developers
> account.
>
>Thanks for your suggestion. As the problem is back I've subscribed to
reach the full content of this discussion.

The answer was "do not use antivirus" :-(. I do not use antivirus as I
am CentOS only.

Patrick

Orion Poplawski

2020-Jul-02 22:05 UTC

head link

[CentOS] CentOS7 and NFS

On 6/1/20 3:08 AM, Patrick B?gou wrote:> Le 13/05/2020 ? 02:13, Orion Poplawski a ?crit?:
>> On 5/12/20 2:46 AM, Patrick B?gou wrote:
>>> Hi,
>>>
>>> I need some help with NFSv4 setup/tuning. I have a dedicated nfs
server
>>> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and
16x
>>> 8TB HDD) used by two servers and a small cluster (400 cores). All
the
>>> servers are running CentOS 7, the cluster is running CentOS6.
>>>
>>> Time to time on the server I get:
>>>
>>>  ???? ?kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with
>>>  ???? incorrect client ID
>>>
>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>
>>>  ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
responding,
>>>  ???? still trying
>>>  ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>  ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
responding,
>>>  ???? still trying
>>>  ???? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>
>>> There is a discussion on RedHat7 support about this but only open
to
>>> subscribers. Other searches with google do not provide? useful
>>> information.
>>
>> FYI - you can get access to such info with a free RHEL developers
>> account.
>>
>>
> Thanks for your suggestion. As the problem is back I've subscribed to
> reach the full content of this discussion.
> 
> The answer was "do not use antivirus" :-(. I do not use antivirus
as I
> am CentOS only.
> 
> Patrick
> 
Just curious to see if you have had any luck resolving these issues? 
I'm afraid that NFS on EL 7 has become much less stable for us recently 
as well with lots more client access hangs.

Orion

-- 
Orion Poplawski
Manager of NWRA Technical Systems          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                 https://www.nwra.com/

cpolish at surewest.net

2020-Jul-04 00:28 UTC

head link

[CentOS] CentOS7 and NFS

On 2020-06-01 11:08, Patrick B?gou wrote:> >> I need some help with NFSv4 setup/tuning. I have a dedicated nfs
server
> >> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet
and 16x
> >> 8TB HDD) used by two servers and a small cluster (400 cores). All
the
> >> servers are running CentOS 7, the cluster is running CentOS6.
> >>
> >> Time to time on the server I get:
> >>
> >> ???? ?kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with
> >> ???? incorrect client ID
According to Red Hat Bugzilla [1], 2015-11-19:

    "testing state ID with incorrect client ID" means the server
    thinks a TEST_STATEID op was sent for a stateid associated with
    a client different from the client associated with the session
    over which the TEST_STATEID was sent. Perhaps this could be the
    result of some confusion in the server's data structures but the
    most straightforward explanation would be just that that's
    really what the client did (perhaps as a result of a bug in
    client recovery code?)

The above explanation is applicable but unless you're running 
a rather old kernel that /particular/ bug is not.

My understanding of your issue from the thread to date is you've
not yet narrowed the issue to the NFS server, the network, the 2
server clients, or the cluster clients. In other words, the
corrupt client ID could be tendered by either of the 2 servers,
or by the cluster clients, or could be corrupted in transit over
the network, or could originate on the NFS server. Correct?

According to my notes from a class given by Ted T'so on NFS,
always start with checking network health. That should be easy,
using interface statistics, eg:

    $ ifconfig eth3
    eth3 Link encap:Ethernet  HWaddr A0:36:9F:10:A9:06  
         inet addr:10.10.1.100  Bcast:10.10.1.255  Mask:255.255.255.0
         inet6 addr: fe80::a236:9fff:fe10:a906/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
         RX packets:858295756  errors:0 dropped:0 overruns:0 frame:0
                               ^^^^^^^^
         TX packets:7090386023 errors:0 dropped:0 overruns:0 carrier:0
                               ^^^^^^^^
         collisions:0 txqueuelen:1000 
         RX bytes:495026510281 (461.0 GiB)  TX bytes:10475167734024 (9.5 TiB)

     $ ip --stats link show eth3
     eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP
qlen 1000
     link/ether a0:36:9f:10:a9:06 brd ff:ff:ff:ff:ff:ff
     RX: bytes      packets    errors  dropped overrun mcast   
     495027287320   858296399  0       0       0       112282  
                               ^^^^^^
     TX: bytes      packets    errors  dropped carrier collsns 
     10475167775376 7090386249 0       0       0       0       
                               ^^^^^^
Layer 2 stats can also be checked using ethtool:

     $ sudo ethtool --statistics eth3 | egrep 'dropped|errors'
     rx_errors: 0
     tx_errors: 0
     ...

If you've got a clean, healthy network, that leaves the clients
or the server. Maybe the clients are asking for the wrong ID.
To analyze the client ID given, you could capture traffic at 
the server using, perhaps:

    # tcpdump -W 10 -C 10 -w nfs_capture host <client-ipaddr>

Then using tshark or wireshark, see if the client is sending
consistent client ID's. If so, that would exonerate the clients,
leaving as suspect the NFS daemon code in the Linux kernel.

Another point that Mr. T'so made (which it sounds like you
have covered) is, don't combine an NFS server with another
application or service. I mention this only because I'm
pedantic and obsessive, or maybe obsessively pedantic.

Also worth mentioning: consider specifying no_subtree_check
in your NFS exports. And T'so suggested (ca. 2012) using
fs_mark (available from the epel repository) to exercize
your file systems.

Best luck,
-- 
Charles Polisher

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1233284

Patrick Bégou

2020-Jul-09 10:11 UTC

head link

[CentOS] CentOS7 and NFS

Hi Orion,

no, I still have this problem. I delay working on it as I the latest
updates have not been installed on the server and on the client. I'll
work again on this problem as soon as possible.

Thanks Charles for your detailed information on how to track this
problem. I'll check all these metrics.

I have several clients for this nfs server and the problem seems only to
occur from the client using nfs 4.1 in CentOS Linux release 7.7.1908 (Core).
The default options used are:
rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=194.254.xx.xx,local_lock=none,addr=194.254.yy.yy

On olders clients (Red Hat Enterprise Linux Server release 6.7
(Santiago)) default options are:
rw,intr,hard,sloppy,vers=4,addr=194.254.xx.xx,clientaddr=194.254.yy.yy

The server in CentOS7.6.1810

Will see if the latest updates help to solve the problem.

Patrick

Le 03/07/2020 ? 00:05, Orion Poplawski a ?crit?:> On 6/1/20 3:08 AM, Patrick B?gou wrote:
>> Le 13/05/2020 ? 02:13, Orion Poplawski a ?crit?:
>>> On 5/12/20 2:46 AM, Patrick B?gou wrote:
>>>> Hi,
>>>>
>>>> I need some help with NFSv4 setup/tuning. I have a dedicated
nfs
>>>> server
>>>> (2 x E5-2620? 8cores/16 threads each, 64GB RAM, 1x10Gb ethernet
and
>>>> 16x
>>>> 8TB HDD) used by two servers and a small cluster (400 cores).
All the
>>>> servers are running CentOS 7, the cluster is running CentOS6.
>>>>
>>>> Time to time on the server I get:
>>>>
>>>> ????? ?kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID
with
>>>> ????? incorrect client ID
>>>>
>>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>>
>>>> ????? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
responding,
>>>> ????? still trying
>>>> ????? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>> ????? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
responding,
>>>> ????? still trying
>>>> ????? ?kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>>
>>>> There is a discussion on RedHat7 support about this but only
open to
>>>> subscribers. Other searches with google do not provide? useful
>>>> information.
>>>
>>> FYI - you can get access to such info with a free RHEL developers
>>> account.
>>>
>>>
>> Thanks for your suggestion. As the problem is back I've subscribed
to
>> reach the full content of this discussion.
>>
>> The answer was "do not use antivirus" :-(. I do not use
antivirus as I
>> am CentOS only.
>>
>> Patrick
>>
>
> Just curious to see if you have had any luck resolving these issues?
> I'm afraid that NFS on EL 7 has become much less stable for us
> recently as well with lots more client access hangs.
>
> Orion
>

Possibly Parallel Threads

Search for more maybe matching threads

CentOS - Jul 2020 - CentOS7 and NFS

[CentOS] CentOS7 and NFS

[CentOS] CentOS7 and NFS

[CentOS] CentOS7 and NFS

[CentOS] CentOS7 and NFS

Possibly Parallel Threads