On Sat, Mar 05, 2016 at 07:42:51PM +0300, Dmitry Sivachenko
wrote:>
> > On 05 Mar 2016, at 19:27, Konstantin Belousov <kostikbel at
gmail.com> wrote:
> >
> > On Sat, Mar 05, 2016 at 05:24:26PM +0300, Dmitry Sivachenko wrote:
> >>>
> >>> Again, error 4 is EINTR so you could disable both
"soft" and "intr" options for test.
> >>
> >>
> >> "soft" is meaningless in such setup, because "file
system calls will fail after retrycnt round trip timeout intervals" but
"The default is a retry count of zero, which means to keep retrying
forever".
> >>
> >> If I understand "intr" correctly, it matters only when
server becomes unresponsive, that is "server is not responding"
message should be in my logs. But I have no such a message.
> >>
> >>
> >
> > The intr NFS mount option allows signals to interrupt NFS waits for
the
> > RPC responses. This is almost certainly the reason for the EINTR
error
> > you get from the pager.
> >
> > You should at last get the
> > vm_fault: pager read error, pid ...
> > messages as well. Is this true ?
>
>
> That is true, see my initial post.
Ok.
>
>
> > The end result would be SIGSEGV
> > delivered to the process.
> >
> > OTOH, I do not quite understand why did your threads requesting
page-in
> > fall into the wait for a free page. I assume that there is enough
free
> > pages in the system ?
> >
>
>
> I have no swap configured, but it is possible that running processes eat
all RAM (I expect them to be killed with OOM rather than stuck?)
I cannot answer this question about 'eat all ram'. You can.
But I suspect that you do have enough free or reclamaible pages for OOM
to not trigger, e.g. because you demonstrated commands output from the
live system after the situation occured. It more likely was a temporal
free page shortage, after which the system recovered.
I more believe in a bug in the handling of killed process in vm_fault().
Could you get the p_flag value for the hung process ? Like
ps -o flags <pid>