On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston
wrote:> On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote:
> > On 12/03/15 00:38, John Baldwin wrote:
> > >>> It sounds like this issue might be the one fixed in
r272566: if the
> > >>> > >KERN_PROC_ALL sysctl is read with an
insufficiently large buffer, an
> > >>> > >sbuf error return value could bubble up and be
treated as ERESTART,
> > >>> > >resulting in a loop.
> > >>> > >
> > >>> > >This can be confirmed with something like
> > >>> > >
> > >>> > > dtrace -n 'syscall:::entry/pid ==
$target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping
proc>
> > >>> > >
> > >>> > >If the output consists solely of __sysctl, this
bug is likely the
> > >>> > >culprit.
> > >> >
> > >> >Unfortunately, I accidentally killed fstat this morning
before I could do any further debug.
> > >> >
> > >> >I ran truss -p on it yesterday and it was spinning solely
on __sysctl.
> > >> >
> > >> >I'll try compiling with debug symbols in case it
happens again. I haven't been able to reproduce the
> > >> >problem in a reasonable time frame so it could be days or
weeks before we see it happen again.
> > > Tha truss output is consistent with Mark's suggestion, so I
would try
> > > his suggested fix of 272566.
> >
> > I patched the 10.1 kernel with r272566 and it appears to have fixed
the issue. Is this patch likely
> > to be MFCed back to 10-stable?
>
> I can't see any reason it shouldn't be, and there was an MFC
reminder in
> the commit log entry for that revision. I've cc'ed kib@, who might
have a
> reason.
The mentioned commit depends on r271976, in fact it depends on the series of
commits, including r271486 and r271489.
I did not merged r271976 with manual resolution of the conficts, since it
means that the work done for HEAD needs to be redone for stable/10 to
ensure that all cases are covered. Later, when the mentioned series is
merged, the work should be redone once more.
And to note, r271489 is not trivially mergeable as well, just checked.