thr3ads.net - zfs discuss - [zfs-discuss] zpool mirror faulted [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Michael Hase

2007-Jun-16 14:33 UTC

[zfs-discuss] zpool mirror faulted

I have a strange problem with a faulted zpool (two way mirror):

[root at einstein;0]~# zpool status poolm
  pool: poolm
 state: FAULTED
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        poolm          UNAVAIL      0     0     0  insufficient replicas
          mirror       UNAVAIL      0     0     0  corrupted data
            c2t0d0s0   ONLINE       0     0     0
            c3t17d0s0  ONLINE       0     0     0

So both devices are ONLINE and have no errors. Why is the whole pool marked
unavailable? I suspect a timing problem, maybe the fc disks were not available
when the pool was constructed at boot time. Could it be repaired somehow?

I tried  "zpool export poolm". Result was a kernel panic:

Jun 16 14:44:42 einstein cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster
that this node is panicking
Jun 16 14:44:42 einstein unix: [ID 836849 kern.notice]
Jun 16 14:44:42 einstein ^Mpanic[cpu3]/thread=2a101553cc0:
Jun 16 14:44:42 einstein unix: [ID 530496 kern.notice] data after EOF:
off=61833216
Jun 16 14:44:42 einstein unix: [ID 100000 kern.notice]
Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553560
zfs:dnode_sync+388 (600033cb990, 7, 60002d5c580, 60009dd4dc0, 0, 7)
Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 00000600033cb9e8 0000000000000003 00000600033cba48
Jun 16 14:44:42 einstein   %l4-7: 0000060002c66400 00000600033cb9eb
000000000000000c 00000600033cba40
Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553620
zfs:dmu_objset_sync_dnodes+6c (60002e02800, 60002e02940, 60009dd4dc0, 600033cb9
90, 0, 0)
Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000fb8500 00000000000a4653 0000060002ee00f8 0000000000000001
Jun 16 14:44:42 einstein   %l4-7: 0000000000000000 0000000070190000
0000000000000007 0000060002d5c580
Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015536d0
zfs:dmu_objset_sync+7c (60002e02800, 60009dd4dc0, 3, 3, 6000ca42ad8, a4653)
Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000003 000000000000000f 0000000000000001 000000000000719a
Jun 16 14:44:43 einstein   %l4-7: 0000060002e02940 0000000000000060
0000060002e028e0 0000060002e02960
Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015537e0
zfs:dsl_dataset_sync+c (60007884f40, 60009dd4dc0, 60007884fd0, 60000f505f8, 600
00f505f8, 60007884f40)
Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000001 0000000000000007 0000060000f50678 0000000000000003
Jun 16 14:44:43 einstein   %l4-7: 0000060007884fc8 0000000000000000
0000000000000000 0000000000000000
Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553890
zfs:dsl_pool_sync+64 (60000f50540, a4653, 60007884f40, 60009dd8500, 60002de4500
, 60002de4528)
Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 0000060000fb88c0 0000060009dd4dc0 0000060000f506d8
Jun 16 14:44:43 einstein   %l4-7: 0000060000f506a8 0000060000f50678
0000060000f505e8 0000060002d5c580
Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553940
zfs:spa_sync+1b0 (60000fb8500, a4653, 0, 0, 2a101553cc4, 1)
Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000fb86c0 0000060000fb86d0 0000060000fb85e8 0000060009dd8500
Jun 16 14:44:44 einstein   %l4-7: 0000000000000000 000006000360f040
0000060000f50540 0000060000fb8680
Jun 16 14:44:44 einstein genunix: [ID 723222 kern.notice] 000002a101553a00
zfs:txg_sync_thread+134 (60000f50540, a4653, 0, 2a101553ab0, 60000f50650, 60000
f50652)
Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000f50660 0000060000f50610 0000000000000000 0000060000f50618
Jun 16 14:44:44 einstein   %l4-7: 0000060000f50656 0000060000f50654
0000060000f50608 00000000000a4654
Jun 16 14:44:44 einstein unix: [ID 100000 kern.notice]
Jun 16 14:44:44 einstein genunix: [ID 672855 kern.notice] syncing file
systems...

Hardware: e420, a5000 split backplane, 2 jni hbas
OS: solaris 10u3 with suncluster 3.2 (second node: ultra 60)

luxadm says that the disks are ok, read access via dd shows no problems.

Any ideas?

Regards,
Michael
 
 
This message posted from opensolaris.org

Victor Latushkin

2007-Jun-16 15:31 UTC

head link

[zfs-discuss] zpool mirror faulted

Hi Michael,

search on bugs.opensolaris.org for "data after EOF" shows that this 
looks pretty much like bug 6424466:

http://bugs.opensolaris.org/view_bug.do?bug_id=6424466

It is fixed in Nevada build 53. Fix for Solaris 10 is going to be 
available with Solaris 10 Update 4, as the second link returned by 
search suggests:

http://bugs.opensolaris.org/view_bug.do?bug_id=2145379

Wbr,
Victor

Michael Hase wrote:> I have a strange problem with a faulted zpool (two way mirror):
> 
> [root at einstein;0]~# zpool status poolm
>   pool: poolm
>  state: FAULTED
>  scrub: none requested
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         poolm          UNAVAIL      0     0     0  insufficient replicas
>           mirror       UNAVAIL      0     0     0  corrupted data
>             c2t0d0s0   ONLINE       0     0     0
>             c3t17d0s0  ONLINE       0     0     0
> 
> So both devices are ONLINE and have no errors. Why is the whole pool marked
unavailable? I suspect a timing problem, maybe the fc disks were not available
when the pool was constructed at boot time. Could it be repaired somehow?
> 
> I tried  "zpool export poolm". Result was a kernel panic:
> 
> Jun 16 14:44:42 einstein cl_dlpitrans: [ID 624622 kern.notice] Notifying
cluster that this node is panicking
> Jun 16 14:44:42 einstein unix: [ID 836849 kern.notice]
> Jun 16 14:44:42 einstein ^Mpanic[cpu3]/thread=2a101553cc0:
> Jun 16 14:44:42 einstein unix: [ID 530496 kern.notice] data after EOF:
off=61833216
> Jun 16 14:44:42 einstein unix: [ID 100000 kern.notice]
> Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553560
zfs:dnode_sync+388 (600033cb990, 7, 60002d5c580, 60009dd4dc0, 0, 7)
> Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 00000600033cb9e8 0000000000000003 00000600033cba48
> Jun 16 14:44:42 einstein   %l4-7: 0000060002c66400 00000600033cb9eb
000000000000000c 00000600033cba40
> Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553620
zfs:dmu_objset_sync_dnodes+6c (60002e02800, 60002e02940, 60009dd4dc0, 600033cb9
> 90, 0, 0)
> Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000fb8500 00000000000a4653 0000060002ee00f8 0000000000000001
> Jun 16 14:44:42 einstein   %l4-7: 0000000000000000 0000000070190000
0000000000000007 0000060002d5c580
> Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015536d0
zfs:dmu_objset_sync+7c (60002e02800, 60009dd4dc0, 3, 3, 6000ca42ad8, a4653)
> Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000003 000000000000000f 0000000000000001 000000000000719a
> Jun 16 14:44:43 einstein   %l4-7: 0000060002e02940 0000000000000060
0000060002e028e0 0000060002e02960
> Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015537e0
zfs:dsl_dataset_sync+c (60007884f40, 60009dd4dc0, 60007884fd0, 60000f505f8, 600
> 00f505f8, 60007884f40)
> Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000001 0000000000000007 0000060000f50678 0000000000000003
> Jun 16 14:44:43 einstein   %l4-7: 0000060007884fc8 0000000000000000
0000000000000000 0000000000000000
> Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553890
zfs:dsl_pool_sync+64 (60000f50540, a4653, 60007884f40, 60009dd8500, 60002de4500
> , 60002de4528)
> Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 0000060000fb88c0 0000060009dd4dc0 0000060000f506d8
> Jun 16 14:44:43 einstein   %l4-7: 0000060000f506a8 0000060000f50678
0000060000f505e8 0000060002d5c580
> Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553940
zfs:spa_sync+1b0 (60000fb8500, a4653, 0, 0, 2a101553cc4, 1)
> Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000fb86c0 0000060000fb86d0 0000060000fb85e8 0000060009dd8500
> Jun 16 14:44:44 einstein   %l4-7: 0000000000000000 000006000360f040
0000060000f50540 0000060000fb8680
> Jun 16 14:44:44 einstein genunix: [ID 723222 kern.notice] 000002a101553a00
zfs:txg_sync_thread+134 (60000f50540, a4653, 0, 2a101553ab0, 60000f50650, 60000
> f50652)
> Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice]   %l0-3:
0000060000f50660 0000060000f50610 0000000000000000 0000060000f50618
> Jun 16 14:44:44 einstein   %l4-7: 0000060000f50656 0000060000f50654
0000060000f50608 00000000000a4654
> Jun 16 14:44:44 einstein unix: [ID 100000 kern.notice]
> Jun 16 14:44:44 einstein genunix: [ID 672855 kern.notice] syncing file
systems...
> 
> Hardware: e420, a5000 split backplane, 2 jni hbas
> OS: solaris 10u3 with suncluster 3.2 (second node: ultra 60)
> 
> luxadm says that the disks are ok, read access via dd shows no problems.
> 
> Any ideas?
> 
> Regards,
> Michael
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Ruben Wisniewski

2007-Jun-16 22:13 UTC

head link

[zfs-discuss] zpool mirror faulted

Michael Hase <michael.hase at six.de> wrote:> I have a strange problem with a faulted zpool (two way mirror):
> 
> [root at einstein;0]~# zpool status poolm
>   pool: poolm
>  state: FAULTED
>  scrub: none requested
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         poolm          UNAVAIL      0     0     0  insufficient
> replicas mirror       UNAVAIL      0     0     0  corrupted data
>             c2t0d0s0   ONLINE       0     0     0
>             c3t17d0s0  ONLINE       0     0     0I have expected the same Problem a while ago with a RAIDz1 on
ZFS-fuse ...

But no workaround - sry.


Greetings Cyron
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070617/5a6035d5/attachment.bin>

Michael Hase

2007-Jun-17 17:19 UTC

head link

[zfs-discuss] Re: zpool mirror faulted

Hi Victor,

the kernel panic in bug 6424466 resulted from overwriting some areas of the
disks, in this case I would expect at least strange things - ok, not exactly a
panic. In my case there was no  messsing around with the underlying disks. The
fix only seems to avoid the panic and mentions no repair methods.

Just discovered that the two devices have different sizes. c2t0d0s0 is just 27gb
whereas c3t17d0s0 has 34gb, on the first disk I reserved a partition for other
testing purposes. Could this be a problem?

Cheers,
Michael
 
 
This message posted from opensolaris.org

Victor Latushkin

2007-Jun-18 14:26 UTC

head link

[zfs-discuss] Re: zpool mirror faulted

Michael Hase wrote:> Hi Victor,
> 
> the kernel panic in bug 6424466 resulted from overwriting some areas
> of the disks, in this case I would expect at least strange things -
> ok, not exactly a panic. In my case there was no  messsing around
> with the underlying disks. The fix only seems to avoid the panic and
> mentions no repair methods.As far as I understand this now, scenario described in the bug 6424466 
may not be the only scenario leading to such panic. I''m not sure if a 
repair method exists (other than recreating a pool from scratch).
> Just discovered that the two devices have different sizes. c2t0d0s0
> is just 27gb whereas c3t17d0s0 has 34gb, on the first disk I reserved
> a partition for other testing purposes. Could this be a problem?I think that it make sense to check slice boundaries for overlaps and 
use of slices by some other entities, like other file systems, swap, 
dump device etc.

Cheers,
Victor

Michael Hase

2007-Jun-24 10:18 UTC

head link

[zfs-discuss] Re: Re: zpool mirror faulted

So I ended up recreating the zpool from scratch, there seems no chance to repair
anything. All data lost - luckily nothing really important. Never had such an
experience with mirrored volumes on svm/ods since solaris 2.4.
Just to clarify things: there was no mocking with the underlying disk devices
like overwriting with dd or something like that. The two slices just had
different sizes, that''s never been a problem for svm.

Cheers,
Michael
 
 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more possibly parallel threads

zfs discuss - Jun 2007 - zpool mirror faulted

[zfs-discuss] zpool mirror faulted

[zfs-discuss] zpool mirror faulted

[zfs-discuss] zpool mirror faulted

[zfs-discuss] Re: zpool mirror faulted

[zfs-discuss] Re: zpool mirror faulted

[zfs-discuss] Re: Re: zpool mirror faulted

Reasonably Related Threads