thr3ads.net - CentOS - [CentOS] NFS help [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Larry Martell

2016-Oct-24 07:52 UTC

[CentOS] NFS help

On Fri, Oct 21, 2016 at 11:42 AM,  <m.roth at 5-cent.us>
wrote:> Larry Martell wrote:
>> On Fri, Oct 21, 2016 at 11:21 AM,  <m.roth at 5-cent.us> wrote:
>>> Larry Martell wrote:
>>>> We have 1 system ruining Centos7 that is the NFS server. There
are 50
>>>> external machines that FTP files to this server fairly
continuously.
>>>>
>>>> We have another system running Centos6 that mounts the
partition the
>>>> files
>>>> are FTP-ed to using NFS.
>>> <snip>
>>> What filesystem?
>>
>> Sorry for being dense, but I am not a sys admin, I am programmer and
>> we have no sys admin. I don't know what you mean by your question.
I
>> am NFS mounting to what ever the default filesystem would be on a
>> CentOS6 system.
>
> This *is* a sysadmin issue. Each partition is formatted as a specific type
> of filesystem. The standard Linux filesystems for Upsteam-descended have
> been ext3, then ext4, and now xfs. Tools to manipulate xfs will not work
> with extx, and vice versa.
>
> cat /etc/fstab on the systems, and see what they are. If either is xfs,
> and assuming that the systems are on UPSes, then the fstab which controls
> drive mounting on a system should have, instead of "defaults",
> nobarrier,inode64.
The server is xfs (the client is nfs). The server does have inode64
specified, but not nobarrier.
> Note that the inode64 is relevant if the filesystem is > 2TB.
The file system is 51TB.
> The reason I say this is that we we started rolling out CentOS 7, we tried
> to put one of our user's home directory on one, and it was a disaster.
> 100% repeatedly, untarring a 100M tarfile onto an nfs-mounted drive took
> seven minutes, where before, it had taken 30 seconds. Timed. It took us
> months to discover that NFS 4 tries to make transactions atomic, which is
> fine if you're worrying about losing power or connectivity. If
you're on a
> UPS, and hardwired, adding the nobarrier immediately brought it down to 40
> seconds or so.
We are not seeing a performance issue - do you think nobarrier would
help with our lock up issue? I wanted to try it but my client did not
want me to make any changes until we got the bad disk replaced.
Unfortunately that will not happen until Wednesday.

mark

2016-Oct-24 11:51 UTC

head link

[CentOS] NFS help

On 10/24/16 03:52, Larry Martell wrote:> On Fri, Oct 21, 2016 at 11:42 AM,  <m.roth at 5-cent.us> wrote:
>> Larry Martell wrote:
>>> On Fri, Oct 21, 2016 at 11:21 AM,  <m.roth at 5-cent.us>
wrote:
>>>> Larry Martell wrote:
>>>>> We have 1 system ruining Centos7 that is the NFS server.
There are 50
>>>>> external machines that FTP files to this server fairly
continuously.
>>>>>
>>>>> We have another system running Centos6 that mounts the
partition the
>>>>> files are FTP-ed to using NFS.
<snip>>>>> What filesystem?
<snip>>> cat /etc/fstab on the systems, and see what they are. If either is xfs,
>> and assuming that the systems are on UPSes, then the fstab which
controls
>> drive mounting on a system should have, instead of
"defaults",
>> nobarrier,inode64.
>
> The server is xfs (the client is nfs). The server does have inode64
> specified, but not nobarrier.
>
>> Note that the inode64 is relevant if the filesystem is > 2TB.
>
> The file system is 51TB.
>
>> The reason I say this is that we we started rolling out CentOS 7, we
tried
>> to put one of our user's home directory on one, and it was a
disaster.
>> 100% repeatedly, untarring a 100M tarfile onto an nfs-mounted drive
took
>> seven minutes, where before, it had taken 30 seconds. Timed. It took us
>> months to discover that NFS 4 tries to make transactions atomic, which
is
>> fine if you're worrying about losing power or connectivity. If
you're on a
>> UPS, and hardwired, adding the nobarrier immediately brought it down to
40
>> seconds or so.
>
> We are not seeing a performance issue - do you think nobarrier would
> help with our lock up issue? I wanted to try it but my client did not
> want me to make any changes until we got the bad disk replaced.
> Unfortunately that will not happen until Wednesday.
Absolutely add nobarrier, and see what happens.

	mark

Gordon Messmer

2016-Oct-24 13:58 UTC

head link

[CentOS] NFS help

On 10/24/2016 04:51 AM, mark wrote:> Absolutely add nobarrier, and see what happens. 

Using "nobarrier" might increase overall write throughput, but it 
removes an important integrity feature, increasing the risk of 
filesystem corruption on power loss.  I wouldn't recommend doing that 
unless your system is on a UPS, and you've tested and verified that it 
will perform an orderly shutdown when the UPS is on battery power and 
its charge is low.

Larry Martell

2016-Oct-27 04:54 UTC

head link

[CentOS] NFS help

On Mon, Oct 24, 2016 at 7:51 AM, mark <m.roth at 5-cent.us>
wrote:> On 10/24/16 03:52, Larry Martell wrote:
>>
>> On Fri, Oct 21, 2016 at 11:42 AM,  <m.roth at 5-cent.us> wrote:
>>>
>>> Larry Martell wrote:
>>>>
>>>> On Fri, Oct 21, 2016 at 11:21 AM,  <m.roth at 5-cent.us>
wrote:
>>>>>
>>>>> Larry Martell wrote:
>>>>>>
>>>>>> We have 1 system ruining Centos7 that is the NFS
server. There are 50
>>>>>> external machines that FTP files to this server fairly
continuously.
>>>>>>
>>>>>> We have another system running Centos6 that mounts the
partition the
>>>>>> files are FTP-ed to using NFS.
>
> <snip>
>>>>>
>>>>> What filesystem?
>
> <snip>
>>>
>>> cat /etc/fstab on the systems, and see what they are. If either is
xfs,
>>> and assuming that the systems are on UPSes, then the fstab which
controls
>>> drive mounting on a system should have, instead of
"defaults",
>>> nobarrier,inode64.
>>
>>
>> The server is xfs (the client is nfs). The server does have inode64
>> specified, but not nobarrier.
>>
>>> Note that the inode64 is relevant if the filesystem is > 2TB.
>>
>>
>> The file system is 51TB.
>>
>>> The reason I say this is that we we started rolling out CentOS 7,
we
>>> tried
>>> to put one of our user's home directory on one, and it was a
disaster.
>>> 100% repeatedly, untarring a 100M tarfile onto an nfs-mounted drive
took
>>> seven minutes, where before, it had taken 30 seconds. Timed. It
took us
>>> months to discover that NFS 4 tries to make transactions atomic,
which is
>>> fine if you're worrying about losing power or connectivity. If
you're on
>>> a
>>> UPS, and hardwired, adding the nobarrier immediately brought it
down to
>>> 40
>>> seconds or so.
>>
>>
>> We are not seeing a performance issue - do you think nobarrier would
>> help with our lock up issue? I wanted to try it but my client did not
>> want me to make any changes until we got the bad disk replaced.
>> Unfortunately that will not happen until Wednesday.
>
>
> Absolutely add nobarrier, and see what happens.
Finally got to add nobarrier (I'll skip why it took so long), and it
looks like this just caused the problem to morph a bit.

On the C7 NFS server, besides having 50 external machines ftp-ing
files to it, we run 2 jobs: 1 that moves files around (called
image_mover) and one that changes perms on some files (called
chmod_job).

And on the C6 NFS client, besides the job that was hanging (called the
importer), we also run a another job (called ftp_job) that ftps files
to the C6 machine. The ftp_job had never hung before, but now the
importer that used to hang has not (yet) hung, and the ftp_job that
had not hung before now is hanging.

But the system messages are different.

On the C7 server there is a series of messages of the form 'task
blocked for >120 seconds' with a stack trace. There is one for each of
the following:

nfsd, chmod_job, kworker, pure_ftpd, image_mover

In each of the stack traces they are blocked on either nfs_write or nfs_flush

And on the C6 client there is a similar blocked message for the ftp
job, blocked on nfs_flush, then the bad sequence number message I had
seen before, and at that point the ftp_job hung.

Possibly Parallel Threads

Search for more possibly parallel threads

CentOS - Oct 2016 - NFS help

[CentOS] NFS help

[CentOS] NFS help

[CentOS] NFS help

[CentOS] NFS help

Possibly Parallel Threads