thr3ads.net - freebsd stable - nfs_getpages: error 4 [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Dmitry Sivachenko

2016-Mar-05 12:32 UTC

nfs_getpages: error 4

> On 05 Mar 2016, at 15:13, Eugene Grosbein <eugen at grosbein.net>
wrote:
> 
> 05.03.2016 18:21, Dmitry Sivachenko ?????:
>> Hello,
>> 
>> I am running a number of machines with /home mounted via nfs (FreeBSD
10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>> 
>> Sometimes I get the following messages in syslog:
>> 
>> nfs_getpages: error 4
>> vm_fault: pager read error, pid NNN (myprog)
>> 
>> After that I see I lot of processes stuck in "pfault" state
(these are computational processes which use some files from NFS mount), they
use 0% of CPU after that.
>> 
>> On NFS server machine I see nothing strange in logs.  procstat -kk for
such stuck processes shows:
>>  PID    TID COMM             TDNAME           KSTACK
>> 85274 102056 myprog           -                mi_switch+0xbe
sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0
vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>> 
>> 
>> What can be the reason of this?
> 
> For example, if some processes running on NFS server box modify some files
"in-place"
> and these files are opened by processes running on NFS client, that could
be the reason.
> If so, change this so processes updating such files create new temporary
versions of them first
> and then rename them atomically.
> 
This should not be the case: users are working only on NFS clients.
Moreover, the nature of computations is so that each process uses it's own
set of files.

(Forgot to mention in my previous e-mail that these processes can't be
stopped even with kill -9)

Eugene Grosbein

2016-Mar-05 13:33 UTC

head link

nfs_getpages: error 4

05.03.2016 19:32, Dmitry Sivachenko ?????:
>>> I am running a number of machines with /home mounted via nfs
(FreeBSD 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>>>
>>> Sometimes I get the following messages in syslog:
>>>
>>> nfs_getpages: error 4
>>> vm_fault: pager read error, pid NNN (myprog)
>>>
>>> After that I see I lot of processes stuck in "pfault"
state (these are computational processes which use some files from NFS mount),
they use 0% of CPU after that.
>>>
>>> On NFS server machine I see nothing strange in logs.  procstat -kk
for such stuck processes shows:
>>>   PID    TID COMM             TDNAME           KSTACK
>>> 85274 102056 myprog           -                mi_switch+0xbe
sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0
vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>>>
>>>
>>> What can be the reason of this?
>>
>> For example, if some processes running on NFS server box modify some
files "in-place"
>> and these files are opened by processes running on NFS client, that
could be the reason.
>> If so, change this so processes updating such files create new
temporary versions of them first
>> and then rename them atomically.
>>
>
> This should not be the case: users are working only on NFS clients.
> Moreover, the nature of computations is so that each process uses it's
own set of files.
>
> (Forgot to mention in my previous e-mail that these processes can't be
stopped even with kill -9)
Make sure you use TCP mounts and TSO is disabled. Try switching between
NFSv3/NFSv4 to avoid this bug
and to discover what version is broken. And show full mount command/option set.

freebsd stable - Mar 2016 - nfs_getpages: error 4

nfs_getpages: error 4

nfs_getpages: error 4