thr3ads.net - freebsd stable - Non-responsive 8.0-RC1 (now 8.0-STABLE) [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Peter Jeremy

2009-Dec-05 22:48 UTC

Non-responsive 8.0-RC1 (now 8.0-STABLE)

On 2009-Nov-30 19:13:30 +1100, Peter Jeremy
<peter@server.vk2pj.dyndns.org> wrote:>On 2009-Nov-29 08:56:55 +0100, Thomas Backman <serenity@exscape.org>
wrote:
>>
>>On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>>
>>> My main server is running 8.0/amd64 from between RC1 and RC2 and
I've
>>> recently had a couple of long-duration hangs on it during which
time
>>> processes doing I/O will stop responding.
...>It actually "hung" again just after I sent the original mail. 
This
>time I managed to get console access and could check the kernel state.
>This showed that a number of processes were blocked on ZFS locks.
>The most commonly reported state was 'tx->tx_quiesce_done_cv)'.
I've upgraded to 8-STABLE from 30-Nov and the problem is still present,
even after disabling the boinc processes.

This seems to leave race conditions inside ZFS as the only option.

Has anyone else seen anything like this?

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20091205/e0fa4b47/attachment.pgp

Elliot Finley

2009-Dec-06 00:31 UTC

head link

Non-responsive 8.0-RC1 (now 8.0-STABLE)

On Sat, Dec 5, 2009 at 3:48 PM, Peter Jeremy <peterjeremy@acm.org> wrote:
> On 2009-Nov-30 19:13:30 +1100, Peter Jeremy
<peter@server.vk2pj.dyndns.org>
> wrote:
> >On 2009-Nov-29 08:56:55 +0100, Thomas Backman
<serenity@exscape.org>
> wrote:
> >>
> >>On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
> >>
> >>> My main server is running 8.0/amd64 from between RC1 and RC2
and I've
> >>> recently had a couple of long-duration hangs on it during
which time
> >>> processes doing I/O will stop responding.
> ...
> >It actually "hung" again just after I sent the original mail.
This
> >time I managed to get console access and could check the kernel state.
> >This showed that a number of processes were blocked on ZFS locks.
> >The most commonly reported state was
'tx->tx_quiesce_done_cv)'.
>
> I've upgraded to 8-STABLE from 30-Nov and the problem is still present,
> even after disabling the boinc processes.
>
> This seems to leave race conditions inside ZFS as the only option.
>
> Has anyone else seen anything like this?
>
>I have a machine running 7.2 that does the same thing if I don't disable ZIL
and prefetch (probably just one of them triggers the hang, just haven't had
time to see which one).  I'll be upgrading it to 8-Stable in the next week
or so and I'll see if the problem persists.  One data point that may or may
not be relevant is that the process that always triggers the hangs is istgt
(iSCSI target from ports).

Elliot

Arnaud Houdelette

2009-Dec-06 09:54 UTC

head link

Non-responsive 8.0-RC1 (now 8.0-STABLE)

Peter Jeremy wrote:> On 2009-Nov-30 19:13:30 +1100, Peter Jeremy
<peter@server.vk2pj.dyndns.org> wrote:
>   
>> On 2009-Nov-29 08:56:55 +0100, Thomas Backman
<serenity@exscape.org> wrote:
>>     
>>> On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>>>
>>>       
>>>> My main server is running 8.0/amd64 from between RC1 and RC2
and I've
>>>> recently had a couple of long-duration hangs on it during which
time
>>>> processes doing I/O will stop responding.
>>>>         
> ...
>   
>> It actually "hung" again just after I sent the original mail.
This
>> time I managed to get console access and could check the kernel state.
>> This showed that a number of processes were blocked on ZFS locks.
>> The most commonly reported state was
'tx->tx_quiesce_done_cv)'.
>>     
>
> I've upgraded to 8-STABLE from 30-Nov and the problem is still present,
> even after disabling the boinc processes.
>
> This seems to leave race conditions inside ZFS as the only option.
>
> Has anyone else seen anything like this?
>
>   I got the same issue since I upgraded to 8.0-RELEASE. I happens during 
high I/O operation such a buildworld. Since I run top in an ssh session, 
I can say that before the hung [zfskern] process shows high CPU usage, 
global system usage is 99%. Sometimes I can get back to normal breaking 
the build with Ctrl-C. Sometimes I don't. If enabled, the watchdog kicks 
in and the machine reboots (else, I just ssh control over it).
The machine is low (512MB) memory, with same tuning as I used in 7.2 
(arc reduced to 60M, device cache to 5M, which gave me a stable machine).
I enabled crashdumps. I can investigate if somebody give me pointers of 
where to look.

Arnaud

freebsd stable - Dec 2009 - Non-responsive 8.0-RC1 (now 8.0-STABLE)

Non-responsive 8.0-RC1 (now 8.0-STABLE)

Non-responsive 8.0-RC1 (now 8.0-STABLE)

Non-responsive 8.0-RC1 (now 8.0-STABLE)