I have a strange problem with a faulted zpool (two way mirror): [root at einstein;0]~# zpool status poolm pool: poolm state: FAULTED scrub: none requested config: NAME STATE READ WRITE CKSUM poolm UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 corrupted data c2t0d0s0 ONLINE 0 0 0 c3t17d0s0 ONLINE 0 0 0 So both devices are ONLINE and have no errors. Why is the whole pool marked unavailable? I suspect a timing problem, maybe the fc disks were not available when the pool was constructed at boot time. Could it be repaired somehow? I tried "zpool export poolm". Result was a kernel panic: Jun 16 14:44:42 einstein cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking Jun 16 14:44:42 einstein unix: [ID 836849 kern.notice] Jun 16 14:44:42 einstein ^Mpanic[cpu3]/thread=2a101553cc0: Jun 16 14:44:42 einstein unix: [ID 530496 kern.notice] data after EOF: off=61833216 Jun 16 14:44:42 einstein unix: [ID 100000 kern.notice] Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553560 zfs:dnode_sync+388 (600033cb990, 7, 60002d5c580, 60009dd4dc0, 0, 7) Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 00000600033cb9e8 0000000000000003 00000600033cba48 Jun 16 14:44:42 einstein %l4-7: 0000060002c66400 00000600033cb9eb 000000000000000c 00000600033cba40 Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553620 zfs:dmu_objset_sync_dnodes+6c (60002e02800, 60002e02940, 60009dd4dc0, 600033cb9 90, 0, 0) Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000fb8500 00000000000a4653 0000060002ee00f8 0000000000000001 Jun 16 14:44:42 einstein %l4-7: 0000000000000000 0000000070190000 0000000000000007 0000060002d5c580 Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015536d0 zfs:dmu_objset_sync+7c (60002e02800, 60009dd4dc0, 3, 3, 6000ca42ad8, a4653) Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000003 000000000000000f 0000000000000001 000000000000719a Jun 16 14:44:43 einstein %l4-7: 0000060002e02940 0000000000000060 0000060002e028e0 0000060002e02960 Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015537e0 zfs:dsl_dataset_sync+c (60007884f40, 60009dd4dc0, 60007884fd0, 60000f505f8, 600 00f505f8, 60007884f40) Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000007 0000060000f50678 0000000000000003 Jun 16 14:44:43 einstein %l4-7: 0000060007884fc8 0000000000000000 0000000000000000 0000000000000000 Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553890 zfs:dsl_pool_sync+64 (60000f50540, a4653, 60007884f40, 60009dd8500, 60002de4500 , 60002de4528) Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000060000fb88c0 0000060009dd4dc0 0000060000f506d8 Jun 16 14:44:43 einstein %l4-7: 0000060000f506a8 0000060000f50678 0000060000f505e8 0000060002d5c580 Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553940 zfs:spa_sync+1b0 (60000fb8500, a4653, 0, 0, 2a101553cc4, 1) Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000fb86c0 0000060000fb86d0 0000060000fb85e8 0000060009dd8500 Jun 16 14:44:44 einstein %l4-7: 0000000000000000 000006000360f040 0000060000f50540 0000060000fb8680 Jun 16 14:44:44 einstein genunix: [ID 723222 kern.notice] 000002a101553a00 zfs:txg_sync_thread+134 (60000f50540, a4653, 0, 2a101553ab0, 60000f50650, 60000 f50652) Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000f50660 0000060000f50610 0000000000000000 0000060000f50618 Jun 16 14:44:44 einstein %l4-7: 0000060000f50656 0000060000f50654 0000060000f50608 00000000000a4654 Jun 16 14:44:44 einstein unix: [ID 100000 kern.notice] Jun 16 14:44:44 einstein genunix: [ID 672855 kern.notice] syncing file systems... Hardware: e420, a5000 split backplane, 2 jni hbas OS: solaris 10u3 with suncluster 3.2 (second node: ultra 60) luxadm says that the disks are ok, read access via dd shows no problems. Any ideas? Regards, Michael This message posted from opensolaris.org
Hi Michael, search on bugs.opensolaris.org for "data after EOF" shows that this looks pretty much like bug 6424466: http://bugs.opensolaris.org/view_bug.do?bug_id=6424466 It is fixed in Nevada build 53. Fix for Solaris 10 is going to be available with Solaris 10 Update 4, as the second link returned by search suggests: http://bugs.opensolaris.org/view_bug.do?bug_id=2145379 Wbr, Victor Michael Hase wrote:> I have a strange problem with a faulted zpool (two way mirror): > > [root at einstein;0]~# zpool status poolm > pool: poolm > state: FAULTED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > poolm UNAVAIL 0 0 0 insufficient replicas > mirror UNAVAIL 0 0 0 corrupted data > c2t0d0s0 ONLINE 0 0 0 > c3t17d0s0 ONLINE 0 0 0 > > So both devices are ONLINE and have no errors. Why is the whole pool marked unavailable? I suspect a timing problem, maybe the fc disks were not available when the pool was constructed at boot time. Could it be repaired somehow? > > I tried "zpool export poolm". Result was a kernel panic: > > Jun 16 14:44:42 einstein cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking > Jun 16 14:44:42 einstein unix: [ID 836849 kern.notice] > Jun 16 14:44:42 einstein ^Mpanic[cpu3]/thread=2a101553cc0: > Jun 16 14:44:42 einstein unix: [ID 530496 kern.notice] data after EOF: off=61833216 > Jun 16 14:44:42 einstein unix: [ID 100000 kern.notice] > Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553560 zfs:dnode_sync+388 (600033cb990, 7, 60002d5c580, 60009dd4dc0, 0, 7) > Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 00000600033cb9e8 0000000000000003 00000600033cba48 > Jun 16 14:44:42 einstein %l4-7: 0000060002c66400 00000600033cb9eb 000000000000000c 00000600033cba40 > Jun 16 14:44:42 einstein genunix: [ID 723222 kern.notice] 000002a101553620 zfs:dmu_objset_sync_dnodes+6c (60002e02800, 60002e02940, 60009dd4dc0, 600033cb9 > 90, 0, 0) > Jun 16 14:44:42 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000fb8500 00000000000a4653 0000060002ee00f8 0000000000000001 > Jun 16 14:44:42 einstein %l4-7: 0000000000000000 0000000070190000 0000000000000007 0000060002d5c580 > Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015536d0 zfs:dmu_objset_sync+7c (60002e02800, 60009dd4dc0, 3, 3, 6000ca42ad8, a4653) > Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000003 000000000000000f 0000000000000001 000000000000719a > Jun 16 14:44:43 einstein %l4-7: 0000060002e02940 0000000000000060 0000060002e028e0 0000060002e02960 > Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a1015537e0 zfs:dsl_dataset_sync+c (60007884f40, 60009dd4dc0, 60007884fd0, 60000f505f8, 600 > 00f505f8, 60007884f40) > Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000007 0000060000f50678 0000000000000003 > Jun 16 14:44:43 einstein %l4-7: 0000060007884fc8 0000000000000000 0000000000000000 0000000000000000 > Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553890 zfs:dsl_pool_sync+64 (60000f50540, a4653, 60007884f40, 60009dd8500, 60002de4500 > , 60002de4528) > Jun 16 14:44:43 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000060000fb88c0 0000060009dd4dc0 0000060000f506d8 > Jun 16 14:44:43 einstein %l4-7: 0000060000f506a8 0000060000f50678 0000060000f505e8 0000060002d5c580 > Jun 16 14:44:43 einstein genunix: [ID 723222 kern.notice] 000002a101553940 zfs:spa_sync+1b0 (60000fb8500, a4653, 0, 0, 2a101553cc4, 1) > Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000fb86c0 0000060000fb86d0 0000060000fb85e8 0000060009dd8500 > Jun 16 14:44:44 einstein %l4-7: 0000000000000000 000006000360f040 0000060000f50540 0000060000fb8680 > Jun 16 14:44:44 einstein genunix: [ID 723222 kern.notice] 000002a101553a00 zfs:txg_sync_thread+134 (60000f50540, a4653, 0, 2a101553ab0, 60000f50650, 60000 > f50652) > Jun 16 14:44:44 einstein genunix: [ID 179002 kern.notice] %l0-3: 0000060000f50660 0000060000f50610 0000000000000000 0000060000f50618 > Jun 16 14:44:44 einstein %l4-7: 0000060000f50656 0000060000f50654 0000060000f50608 00000000000a4654 > Jun 16 14:44:44 einstein unix: [ID 100000 kern.notice] > Jun 16 14:44:44 einstein genunix: [ID 672855 kern.notice] syncing file systems... > > Hardware: e420, a5000 split backplane, 2 jni hbas > OS: solaris 10u3 with suncluster 3.2 (second node: ultra 60) > > luxadm says that the disks are ok, read access via dd shows no problems. > > Any ideas? > > Regards, > Michael > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Michael Hase <michael.hase at six.de> wrote:> I have a strange problem with a faulted zpool (two way mirror): > > [root at einstein;0]~# zpool status poolm > pool: poolm > state: FAULTED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > poolm UNAVAIL 0 0 0 insufficient > replicas mirror UNAVAIL 0 0 0 corrupted data > c2t0d0s0 ONLINE 0 0 0 > c3t17d0s0 ONLINE 0 0 0I have expected the same Problem a while ago with a RAIDz1 on ZFS-fuse ... But no workaround - sry. Greetings Cyron -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070617/5a6035d5/attachment.bin>
Hi Victor, the kernel panic in bug 6424466 resulted from overwriting some areas of the disks, in this case I would expect at least strange things - ok, not exactly a panic. In my case there was no messsing around with the underlying disks. The fix only seems to avoid the panic and mentions no repair methods. Just discovered that the two devices have different sizes. c2t0d0s0 is just 27gb whereas c3t17d0s0 has 34gb, on the first disk I reserved a partition for other testing purposes. Could this be a problem? Cheers, Michael This message posted from opensolaris.org
Michael Hase wrote:> Hi Victor, > > the kernel panic in bug 6424466 resulted from overwriting some areas > of the disks, in this case I would expect at least strange things - > ok, not exactly a panic. In my case there was no messsing around > with the underlying disks. The fix only seems to avoid the panic and > mentions no repair methods.As far as I understand this now, scenario described in the bug 6424466 may not be the only scenario leading to such panic. I''m not sure if a repair method exists (other than recreating a pool from scratch).> Just discovered that the two devices have different sizes. c2t0d0s0 > is just 27gb whereas c3t17d0s0 has 34gb, on the first disk I reserved > a partition for other testing purposes. Could this be a problem?I think that it make sense to check slice boundaries for overlaps and use of slices by some other entities, like other file systems, swap, dump device etc. Cheers, Victor
So I ended up recreating the zpool from scratch, there seems no chance to repair anything. All data lost - luckily nothing really important. Never had such an experience with mirrored volumes on svm/ods since solaris 2.4. Just to clarify things: there was no mocking with the underlying disk devices like overwriting with dd or something like that. The two slices just had different sizes, that''s never been a problem for svm. Cheers, Michael This message posted from opensolaris.org