thr3ads.net - CentOS - [CentOS] NFS help [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Larry Martell

2016-Oct-27 04:54 UTC

[CentOS] NFS help

On Mon, Oct 24, 2016 at 7:51 AM, mark <m.roth at 5-cent.us>
wrote:> On 10/24/16 03:52, Larry Martell wrote:
>>
>> On Fri, Oct 21, 2016 at 11:42 AM,  <m.roth at 5-cent.us> wrote:
>>>
>>> Larry Martell wrote:
>>>>
>>>> On Fri, Oct 21, 2016 at 11:21 AM,  <m.roth at 5-cent.us>
wrote:
>>>>>
>>>>> Larry Martell wrote:
>>>>>>
>>>>>> We have 1 system ruining Centos7 that is the NFS
server. There are 50
>>>>>> external machines that FTP files to this server fairly
continuously.
>>>>>>
>>>>>> We have another system running Centos6 that mounts the
partition the
>>>>>> files are FTP-ed to using NFS.
>
> <snip>
>>>>>
>>>>> What filesystem?
>
> <snip>
>>>
>>> cat /etc/fstab on the systems, and see what they are. If either is
xfs,
>>> and assuming that the systems are on UPSes, then the fstab which
controls
>>> drive mounting on a system should have, instead of
"defaults",
>>> nobarrier,inode64.
>>
>>
>> The server is xfs (the client is nfs). The server does have inode64
>> specified, but not nobarrier.
>>
>>> Note that the inode64 is relevant if the filesystem is > 2TB.
>>
>>
>> The file system is 51TB.
>>
>>> The reason I say this is that we we started rolling out CentOS 7,
we
>>> tried
>>> to put one of our user's home directory on one, and it was a
disaster.
>>> 100% repeatedly, untarring a 100M tarfile onto an nfs-mounted drive
took
>>> seven minutes, where before, it had taken 30 seconds. Timed. It
took us
>>> months to discover that NFS 4 tries to make transactions atomic,
which is
>>> fine if you're worrying about losing power or connectivity. If
you're on
>>> a
>>> UPS, and hardwired, adding the nobarrier immediately brought it
down to
>>> 40
>>> seconds or so.
>>
>>
>> We are not seeing a performance issue - do you think nobarrier would
>> help with our lock up issue? I wanted to try it but my client did not
>> want me to make any changes until we got the bad disk replaced.
>> Unfortunately that will not happen until Wednesday.
>
>
> Absolutely add nobarrier, and see what happens.
Finally got to add nobarrier (I'll skip why it took so long), and it
looks like this just caused the problem to morph a bit.

On the C7 NFS server, besides having 50 external machines ftp-ing
files to it, we run 2 jobs: 1 that moves files around (called
image_mover) and one that changes perms on some files (called
chmod_job).

And on the C6 NFS client, besides the job that was hanging (called the
importer), we also run a another job (called ftp_job) that ftps files
to the C6 machine. The ftp_job had never hung before, but now the
importer that used to hang has not (yet) hung, and the ftp_job that
had not hung before now is hanging.

But the system messages are different.

On the C7 server there is a series of messages of the form 'task
blocked for >120 seconds' with a stack trace. There is one for each of
the following:

nfsd, chmod_job, kworker, pure_ftpd, image_mover

In each of the stack traces they are blocked on either nfs_write or nfs_flush

And on the C6 client there is a similar blocked message for the ftp
job, blocked on nfs_flush, then the bad sequence number message I had
seen before, and at that point the ftp_job hung.

Gordon Messmer

2016-Oct-27 16:35 UTC

head link

[CentOS] NFS help

On 10/26/2016 09:54 PM, Larry Martell wrote:> And on the C6 client there is a similar blocked message for the ftp
> job, blocked on nfs_flush, then the bad sequence number message I had
> seen before, and at that point the ftp_job hung.

Are any of these systems using jumbo frames?  Check the MTU in the 
output of "ip link show" on every system, server and client. If any 
device doesn't match the MTU of all of the others, that might cause the 
problem you're describing.  And if they all match, but they're larger 
than 1500, a switch that doesn't support jumbo frames would also cause 
the problem you're describing.

Larry Martell

2016-Oct-28 05:39 UTC

head link

[CentOS] NFS help

On Thu, Oct 27, 2016 at 12:35 PM, Gordon Messmer
<gordon.messmer at gmail.com> wrote:> On 10/26/2016 09:54 PM, Larry Martell wrote:
>>
>> And on the C6 client there is a similar blocked message for the ftp
>> job, blocked on nfs_flush, then the bad sequence number message I had
>> seen before, and at that point the ftp_job hung.
>
>
>
> Are any of these systems using jumbo frames?  Check the MTU in the output
of
> "ip link show" on every system, server and client. If any device
doesn't
> match the MTU of all of the others, that might cause the problem you're
> describing.  And if they all match, but they're larger than 1500, a
switch
> that doesn't support jumbo frames would also cause the problem
you're
> describing.
They all are 1500.

Seemingly Similar Threads

Search for more possibly parallel threads

CentOS - Oct 2016 - NFS help

[CentOS] NFS help

[CentOS] NFS help

[CentOS] NFS help

Seemingly Similar Threads