thr3ads.net - zfs discuss - [zfs-discuss] Panic running a scrub [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Frank Middleton

2010-Jan-19 19:30 UTC

[zfs-discuss] Panic running a scrub

This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?

The panic starts with:

Jan 19 13:27:13 host6 ^Mpanic[cpu1]/thread=2a1009f5c80:
Jan 19 13:27:13 host6 unix: [ID 403854 kern.notice] assertion failed: 0 ==
zap_update(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_SCRUB_BOOKMARK, sizeof (uint64_t), 4, &dp->dp_scrub_bookmark,
tx), file: ../../common/fs/zfs/dsl_scrub.c, line: 853

FWIW when the system came back up, it resilvered with no
problem and now I''m rerunning the scrub.

Bob Friesenhahn

2010-Jan-19 19:37 UTC

head link

[zfs-discuss] Panic running a scrub

On Tue, 19 Jan 2010, Frank Middleton wrote:
> This is probably unreproducible, but I just got a panic whilst
> scrubbing a simple mirrored pool on scxe snv124. Evidently
> on of the disks went offline for some reason and shortly
> thereafter the panic happened. I have the dump and  the
> /var/adm/messages containing the trace.
>
> Is there any point in submitting a bug report?
I seem to recall that you are not using ECC memory.  If so, maybe the 
panic is a good thing.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Frank Middleton

2010-Jan-19 20:09 UTC

head link

[zfs-discuss] Panic running a scrub

On 01/19/10 02:37 PM, Bob Friesenhahn wrote:
> I seem to recall that you are not using ECC memory. If so, maybe the
> panic is a good thing.
This is on SPARC sun4u. Has ECC etc. I agree that without ECC all
bets are off :-)

Cheers -- Frank

David Magda

2010-Jan-20 00:36 UTC

head link

[zfs-discuss] Panic running a scrub

On Jan 19, 2010, at 14:30, Frank Middleton wrote:
> This is probably unreproducible, but I just got a panic whilst
> scrubbing a simple mirrored pool on scxe snv124. Evidently
> on of the disks went offline for some reason and shortly
> thereafter the panic happened. I have the dump and  the
> /var/adm/messages containing the trace.
>
> Is there any point in submitting a bug report?
Was a crash dump generated? If so, then there''s a chance that it can  
be tracked down I would guess.

Cindy Swearingen

2010-Jan-20 21:27 UTC

head link

[zfs-discuss] Panic running a scrub

Hi Frank,

I couldn''t reproduce this problem on SXCE build 130 by failing a disk
in
mirrored pool and then immediately running a scrub on the pool. It works 
as expected.

Any other symptoms (like a power failure?) before the disk went offline? 
It is possible that both disks went offline?

We would like to review the crash dump if you still have it, just let me 
know when its uploaded.

Thanks,

Cindy


On 01/19/10 12:30, Frank Middleton wrote:> This is probably unreproducible, but I just got a panic whilst
> scrubbing a simple mirrored pool on scxe snv124. Evidently
> on of the disks went offline for some reason and shortly
> thereafter the panic happened. I have the dump and  the
> /var/adm/messages containing the trace.
> 
> Is there any point in submitting a bug report?
> 
> The panic starts with:
> 
> Jan 19 13:27:13 host6 ^Mpanic[cpu1]/thread=2a1009f5c80:
> Jan 19 13:27:13 host6 unix: [ID 403854 kern.notice] assertion failed: 0 
> == zap_update(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, 
> DMU_POOL_SCRUB_BOOKMARK, sizeof (uint64_t), 4,
&dp->dp_scrub_bookmark,
> tx), file: ../../common/fs/zfs/dsl_scrub.c, line: 853
> 
> FWIW when the system came back up, it resilvered with no
> problem and now I''m rerunning the scrub.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Frank Middleton

2010-Jan-20 22:43 UTC

head link

[zfs-discuss] Panic running a scrub

On 01/20/10 04:27 PM, Cindy Swearingen wrote:> Hi Frank,
>
> I couldn''t reproduce this problem on SXCE build 130 by failing a
disk in
> mirrored pool and then immediately running a scrub on the pool. It works
> as expected.
The disk has to fail whilst the scrub is running. It has happened twice now,
once with the bottom half of the mirror, and again with the top half.
  > Any other symptoms (like a power failure?) before the disk went offline?
> It is possible that both disks went offline?
Neither. The system is on a pretty beefy UPS, and one half of the mirror
was definitely online (zpool status just before panic showed one disk
offline and the pool as degraded).
> We would like to review the crash dump if you still have it, just let me
> know when its uploaded.
Do you need the unix.0, vmcore.0 or both? I''ll add either or both as
attachments to newly created Bug 14012, Panic running a scrub,
when you let me know which one(s) you want.

Thanks -- Frank

Cindy Swearingen

2010-Jan-20 22:55 UTC

head link

[zfs-discuss] Panic running a scrub

Hi Frank,

We need both files.

Thanks,

Cindy

On 01/20/10 15:43, Frank Middleton wrote:> On 01/20/10 04:27 PM, Cindy Swearingen wrote:
>> Hi Frank,
>>
>> I couldn''t reproduce this problem on SXCE build 130 by failing
a disk in
>> mirrored pool and then immediately running a scrub on the pool. It
works
>> as expected.
> 
> The disk has to fail whilst the scrub is running. It has happened twice 
> now,
> once with the bottom half of the mirror, and again with the top half.
>  
>> Any other symptoms (like a power failure?) before the disk went
offline?
>> It is possible that both disks went offline?
> 
> Neither. The system is on a pretty beefy UPS, and one half of the mirror
> was definitely online (zpool status just before panic showed one disk
> offline and the pool as degraded).
> 
>> We would like to review the crash dump if you still have it, just let
me
>> know when its uploaded.
> 
> Do you need the unix.0, vmcore.0 or both? I''ll add either or both
as
> attachments to newly created Bug 14012, Panic running a scrub,
> when you let me know which one(s) you want.
> 
> Thanks -- Frank
> 
>

Frank Middleton

2010-Jan-21 00:10 UTC

head link

[zfs-discuss] Panic running a scrub

On 01/20/10 05:55 PM, Cindy Swearingen wrote:> Hi Frank,
>
> We need both files.
The vmcore is 1.4GB. An http upload is never going to complete.
Is there an ftp-able place to send it, or can you download it if I
post it somewhere?

Cheers -- Frank

Frank Middleton

2010-Jan-21 04:26 UTC

head link

[zfs-discuss] Panic running a scrub

On 01/20/10 04:27 PM, Cindy Swearingen wrote:> Hi Frank,
>
> I couldn''t reproduce this problem on SXCE build 130 by failing a
disk in
> mirrored pool and then immediately running a scrub on the pool. It works
> as expected.
As noted, the disk mustn''t go offline until well after the scrub has
started.

There''s another wrinkle. There are some COMSTAR iscsi targets on this
pool. If there are no initiators accessing any of them, the scrub completes
with no errors after 6 hours. If one specific target is active, the panic
ensues reproducibly at about 5h30m or so.

The precise configuration has 2 disks on one LSI controller as a
mirrored pool (whole disks - no slices). Around 750GB of 1.3TB was
in use when the most recent iscsi target was created. The pool
is read-mostly, so it probably isn''t fragmented. The zvol has
copies=1; compression off (no dedupe with snv124). The initiator
is VirtualBox running on Fedora C10 on AMD64 and the target disk
has 32 bit Fedora C12 installed as "whole disk", which I believe is
EFI.

To reproduce this might require setting up a COMSTAR iscsi
target on a mirrored pool, formatting it with an EFI label, and
then running a scrub. Another, similar, target has OpenSolaris
installed on it, and it doesn''t seem to cause a panic on a scrub
if it is running; AFAIK it doesn''t use EFI, but I have not run
a scrub with it active since converting to COMSTAR either.

This wouldn''t explain why one or the other disk randomly goes
offline and it may be a red herring. But the scrub now runs to
completion just as it always has. Since I can''t get FC12 to boot
from the EFI disk in VirtualBox, I may reinstall FC12 without
EFI and see if that makes a difference, but it is an extremely
slow process since it takes almost 6 hours for the panic to occur
each time and there''s no practical way to "relocate" the zvol
to the start of the pool.

HTH -- Frank

Seemingly Similar Threads

Search for more possibly parallel threads

zfs discuss - Jan 2010 - Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

[zfs-discuss] Panic running a scrub

Seemingly Similar Threads