Hi, System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and 2x146GB (space pool). snv_98. After a panic the system hangs on boot and manual attempts to mount (at least) one dataset in single user mode, hangs. The Panic: Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20: Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID 0x00167f73.1c737868 UE Error(s) Dec 27 04:42:11 base See previous message(s) for details Dec 27 04:42:11 base unix: [ID 100000 kern.notice] Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433efc0 SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10, 2a10433f207, 2a10433f208) Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: 000002a10433f0cb 00000000000f0000 00000000012ccc00 00000000012cd000 Dec 27 04:42:11 base %l4-7: 000002a10433f208 0000000000000170 00000000012ccc00 0000000000000001 Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f210 SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe00000, 0, 180200000, 40, 0, a0b7ff60) Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 000000000180c000 0000000000000000 000002a10433f3d4 Dec 27 04:42:11 base %l4-7: 00000000012cc400 000000007e600000 00000000012cc400 0000000000000001 Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f410 unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 1800000) Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000000002 0000000000001400 0000000080001601 00000000012c1578 Dec 27 04:42:11 base %l4-7: 0000000ae394c629 0000060017a32260 000000000000000b 000002a10433f4c0 Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f560 unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16, 30001e4cc40) Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000000 00000180048279c0 000002a1035dbca0 Dec 27 04:42:12 base %l4-7: 0000000000000001 0000000001867800 0000000025be86dc 00000000018bbc00 Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f610 genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0) Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000c44002 00000000018d0e58 0000000000000001 0000000000c44002 Dec 27 04:42:12 base %l4-7: 0000000000000000 0000000000000001 0000000000000002 0000000001326e5c Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f6c0 zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10, 30034dc1f48, 1) Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: 000006001cdcf7f0 000000000000ffff 0000000000000100 000000000000fc00 Dec 27 04:42:12 base %l4-7: 00000000018d7000 000000000c6eefd9 000000000c6eefd8 000000000c6eefd8 Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f770 zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67, 0) Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: 0000060018b5d068 ffffffffffffffff 0000060010ce1040 000006001583be88 Dec 27 04:42:12 base %l4-7: 0000060013760380 00000000000000c0 000003002bf81138 000003001365b778 Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f820 zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c, 1b1a5ae5, 0) Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000001 00000600177fe080 000006001c1f2ad8 Dec 27 04:42:13 base %l4-7: 00000000000001c0 0000000000000001 0000060010c78000 0000000000000000 Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f8d0 zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00) Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: 000003001e94d948 0000000000010000 000000000180c008 0000000000000008 Dec 27 04:42:13 base %l4-7: 0000060013760458 0000000000000000 000000000134fc00 00000000018d2000 Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f980 genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200, 30028320b40) Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: 0000060011f6c828 0000000000000007 000006001c1f20c8 00000000013409d8 Dec 27 04:42:13 base %l4-7: 0000000000000000 0000000000000001 0000000000000000 00000000018bcc00 Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433fa30 genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40) Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 000000000000f071 00000000f0710000 000000000000f071 Dec 27 04:42:13 base %l4-7: 0000000000000001 000000000180c000 0000000000000000 0000000000000000 Dec 27 04:42:14 base unix: [ID 100000 kern.notice] Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file systems... Dec 27 04:42:14 base genunix: [ID 904073 kern.notice] done Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0, errID 0x00167f73.1c737868 Dec 27 04:42:15 base AFSR 0x00000001<ME>.80200000<PRIV,UE> AFAR 0x00000000.a0b7ff60 Dec 27 04:42:15 base AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x101b708 Dec 27 04:42:15 base UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00 Dec 27 04:42:15 base UDBH Syndrome 0x3 Memory Module U1402 U0402 U1401 U0401 Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 325743 kern.warning] WARNING: [AFT1] errID 0x00167f73.1c737868 Syndrome 0x3 indicates that this may not be a memory module problem Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 151010 kern.info] [AFT2] errID 0x00167f73.1c737868 PA=0x00000000.a0b7ff60 Dec 27 04:42:16 base E$tag 0x00000000.1cc01416 E$State: Exclusive E $parity 0x0e Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x00): 0x0070ba48.00000000 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x08): 0x00000000.00000000 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x10): 0x00ec48b9.495349e1 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x18): 0x4955a237.00000000 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E $Data (0x20): 0x00000800.00000000 *Bad* PSYND=0xff00 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x28): 0x0070ba28.00000000 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x30): 0x00000000.00000000 Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E $Data (0x38): 0x027a7ea6.494f4aeb Dec 27 04:47:56 base genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version snv_98 64-bit Dec 27 04:47:56 base genunix: [ID 172908 kern.notice] Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Dec 27 04:47:56 base Use is subject to license terms. My guess would be a broken CPU, Maybe the old Ecache-problem... Anyway, "zfs mount space" works fine, but "zfs mount space/postfix" hangs. A look at the zfs-process shows: # echo "0t236::pid2proc|::walk thread|::findstack -v" | mdb -k stack pointer for thread 30001cecc00: 2a100fa2181 [ 000002a100fa2181 cv_wait+0x3c() ] 000002a100fa2231 txg_wait_open+0x58(60014aa1158, d000b, 0, 60014aa119c, 60014aa119e, 60014aa1150) 000002a100fa22e1 dmu_tx_assign+0x3c(60022dd3780, 1, 7, 60013cd5918, 5b, 1) 000002a100fa2391 dmu_free_long_range_impl+0xc4(600245fbdb0, 60025f69750, 0, 400, 0, 1) 000002a100fa2451 dmu_free_long_range+0x44(600245fbdb0, 43b12, 0, ffffffffffffffff, 1348800, 0) 000002a100fa2511 zfs_rmnode+0x68(60025bb6f20, 12, 600243af9e0, 1, 600243af880 , 600245fbdb0) 000002a100fa25d1 zfs_inactive+0x134(600243af988, 0, 60025f6fef8, 4000, 420, 60025bb6f20) 000002a100fa2681 zfs_rename+0x73c(6002401e400, 40800000004, 6002401e400, 60021860041, 60022dd3780, 60025bb6fe8) 000002a100fa27c1 fop_rename+0xac(6002401e400, 60021860030, 6002401e400, 60021860041, 60010c03e08, 0) 000002a100fa2881 zfs_replay_rename+0xb4(18bbc00, 6002400e8b0, 0, 60014a94000, 0, 0) 000002a100fa2951 zil_replay_log_record+0x244(18d1ed0, 60017108000, 2a100fa3450 , 0, 6002347fc80, 60014a94000) 000002a100fa2a41 zil_parse+0x160(58, 132573c, 13253a4, 2a100fa3450, cff2c, 1978d7) 000002a100fa2ba1 zil_replay+0xa4(9050200ff00ff, 600243af880, 600243af8b0, 40000, 60022ad91d8, 6002347fc80) 000002a100fa2c81 zfsvfs_setup+0x94(600243af880, 1, 18d1c00, 600151e8400, 18d0c00, 0) 000002a100fa2d31 zfs_domount+0x2dc(60011f08d08, 60022afe480, 60011f08d08, 600243af890, 0, 400) 000002a100fa2e11 zfs_mount+0x1ec(60011f08d08, 6002401e200, 2a100fa39d8, 100, 0 , 2) 000002a100fa2f71 domount+0xaf0(100, 1, 6002401e200, 8077, 60011f08d08, 0) 000002a100fa3121 mount+0xec(60023dd7388, 2a100fa3ad8, 0, ff104ed8, 100, 45bd0 ) 000002a100fa3221 syscall_ap+0x44(2a0, ffbfe8a8, 115b9e8, 60023dd72d0, 15, 0) 000002a100fa32e1 syscall_trap32+0xcc(45bd0, ffbfe8a8, 100, ff104ed8, 0, 0) zpool status and fmdump don''t indicate any problems. Any possibility to recover the dataset? I do have backups of all data, but I would really like to be able to recover it to save some time. Anything special to look for in zdb output? Any other diagnostics that would be useful? Thanks in advance! Best Regards //Magnus
Hi again, No ideas? I have spent quite some time trying to recover, but no luck yet. Any ideas or hints on recovery would be great! I''m soon running out of time and will have to rebuild the zones and restore the data but I''d much rather like to be able to recover it from the datasets. Since posting the initial question I have identified one more dataset that has the same problem and can''t be mounted. Basically it''s the two datasets with most disk activity. Regards //Magnus Begin forwarded message:> From: Magnus Bergman <mb at citynetwork.se> > Date: December 28, 2008 18:11:44 GMT+01:00 > To: zfs-discuss at opensolaris.org > Subject: [zfs-discuss] zfs mount hangs > > Hi, > > System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and > 2x146GB (space pool). snv_98. > > After a panic the system hangs on boot and manual attempts to mount > (at least) one dataset in single user mode, hangs. > > The Panic: > > Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20: > Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID > 0x00167f73.1c737868 UE Error(s) > Dec 27 04:42:11 base See previous message(s) for details > Dec 27 04:42:11 base unix: [ID 100000 kern.notice] > Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433efc0 > SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10, > 2a10433f207, 2a10433f208) > Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: > 000002a10433f0cb 00000000000f0000 00000000012ccc00 00000000012cd000 > Dec 27 04:42:11 base %l4-7: 000002a10433f208 0000000000000170 > 00000000012ccc00 0000000000000001 > Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f210 > SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe00000, 0, 180200000, 40, 0, > a0b7ff60) > Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000000000 000000000180c000 0000000000000000 000002a10433f3d4 > Dec 27 04:42:11 base %l4-7: 00000000012cc400 000000007e600000 > 00000000012cc400 0000000000000001 > Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f410 > unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 1800000) > Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000000002 0000000000001400 0000000080001601 00000000012c1578 > Dec 27 04:42:11 base %l4-7: 0000000ae394c629 0000060017a32260 > 000000000000000b 000002a10433f4c0 > Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f560 > unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16, > 30001e4cc40) > Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000000000 0000000000000000 00000180048279c0 000002a1035dbca0 > Dec 27 04:42:12 base %l4-7: 0000000000000001 0000000001867800 > 0000000025be86dc 00000000018bbc00 > Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f610 > genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0) > Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000c44002 00000000018d0e58 0000000000000001 0000000000c44002 > Dec 27 04:42:12 base %l4-7: 0000000000000000 0000000000000001 > 0000000000000002 0000000001326e5c > Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f6c0 > zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10, > 30034dc1f48, 1) > Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: > 000006001cdcf7f0 000000000000ffff 0000000000000100 000000000000fc00 > Dec 27 04:42:12 base %l4-7: 00000000018d7000 000000000c6eefd9 > 000000000c6eefd8 000000000c6eefd8 > Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f770 > zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67, > 0) > Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: > 0000060018b5d068 ffffffffffffffff 0000060010ce1040 000006001583be88 > Dec 27 04:42:12 base %l4-7: 0000060013760380 00000000000000c0 > 000003002bf81138 000003001365b778 > Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f820 > zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c, > 1b1a5ae5, 0) > Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000000001 0000000000000001 00000600177fe080 000006001c1f2ad8 > Dec 27 04:42:13 base %l4-7: 00000000000001c0 0000000000000001 > 0000060010c78000 0000000000000000 > Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f8d0 > zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00) > Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: > 000003001e94d948 0000000000010000 000000000180c008 0000000000000008 > Dec 27 04:42:13 base %l4-7: 0000060013760458 0000000000000000 > 000000000134fc00 00000000018d2000 > Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f980 > genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200, > 30028320b40) > Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: > 0000060011f6c828 0000000000000007 000006001c1f20c8 00000000013409d8 > Dec 27 04:42:13 base %l4-7: 0000000000000000 0000000000000001 > 0000000000000000 00000000018bcc00 > Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433fa30 > genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40) > Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: > 0000000000000000 000000000000f071 00000000f0710000 000000000000f071 > Dec 27 04:42:13 base %l4-7: 0000000000000001 000000000180c000 > 0000000000000000 0000000000000000 > Dec 27 04:42:14 base unix: [ID 100000 kern.notice] > Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file > systems... > Dec 27 04:42:14 base genunix: [ID 904073 kern.notice] done > Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning] > WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at > TL=0, errID 0x00167f73.1c737868 > Dec 27 04:42:15 base AFSR 0x00000001<ME>.80200000<PRIV,UE> AFAR > 0x00000000.a0b7ff60 > Dec 27 04:42:15 base AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 > Fault_PC 0x101b708 > Dec 27 04:42:15 base UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000 > UDBL.ESYND 0x00 > Dec 27 04:42:15 base UDBH Syndrome 0x3 Memory Module U1402 U0402 > U1401 U0401 > Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 325743 kern.warning] > WARNING: [AFT1] errID 0x00167f73.1c737868 Syndrome 0x3 indicates that > this may not be a memory module problem > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 151010 kern.info] [AFT2] > errID 0x00167f73.1c737868 PA=0x00000000.a0b7ff60 > Dec 27 04:42:16 base E$tag 0x00000000.1cc01416 E$State: > Exclusive E > $parity 0x0e > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x00): 0x0070ba48.00000000 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x08): 0x00000000.00000000 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x10): 0x00ec48b9.495349e1 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x18): 0x4955a237.00000000 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 989652 kern.info] > [AFT2] E > $Data (0x20): 0x00000800.00000000 *Bad* PSYND=0xff00 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x28): 0x0070ba28.00000000 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x30): 0x00000000.00000000 > Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] > [AFT2] E > $Data (0x38): 0x027a7ea6.494f4aeb > Dec 27 04:47:56 base genunix: [ID 540533 kern.notice] ^MSunOS Release > 5.11 Version snv_98 64-bit > Dec 27 04:47:56 base genunix: [ID 172908 kern.notice] Copyright > 1983-2008 Sun Microsystems, Inc. All rights reserved. > Dec 27 04:47:56 base Use is subject to license terms. > > My guess would be a broken CPU, Maybe the old Ecache-problem... > > Anyway, "zfs mount space" works fine, but "zfs mount space/postfix" > hangs. A look at the zfs-process shows: > > # echo "0t236::pid2proc|::walk thread|::findstack -v" | mdb -k > stack pointer for thread 30001cecc00: 2a100fa2181 > [ 000002a100fa2181 cv_wait+0x3c() ] > 000002a100fa2231 txg_wait_open+0x58(60014aa1158, d000b, 0, > 60014aa119c, > 60014aa119e, 60014aa1150) > 000002a100fa22e1 dmu_tx_assign+0x3c(60022dd3780, 1, 7, 60013cd5918, > 5b, 1) > 000002a100fa2391 dmu_free_long_range_impl+0xc4(600245fbdb0, > 60025f69750, 0, > 400, 0, 1) > 000002a100fa2451 dmu_free_long_range+0x44(600245fbdb0, 43b12, 0, > ffffffffffffffff, 1348800, 0) > 000002a100fa2511 zfs_rmnode+0x68(60025bb6f20, 12, 600243af9e0, 1, > 600243af880 > , 600245fbdb0) > 000002a100fa25d1 zfs_inactive+0x134(600243af988, 0, 60025f6fef8, > 4000, 420, > 60025bb6f20) > 000002a100fa2681 zfs_rename+0x73c(6002401e400, 40800000004, > 6002401e400, > 60021860041, 60022dd3780, 60025bb6fe8) > 000002a100fa27c1 fop_rename+0xac(6002401e400, 60021860030, > 6002401e400, > 60021860041, 60010c03e08, 0) > 000002a100fa2881 zfs_replay_rename+0xb4(18bbc00, 6002400e8b0, 0, > 60014a94000, > 0, 0) > 000002a100fa2951 zil_replay_log_record+0x244(18d1ed0, 60017108000, > 2a100fa3450 > , 0, 6002347fc80, 60014a94000) > 000002a100fa2a41 zil_parse+0x160(58, 132573c, 13253a4, 2a100fa3450, > cff2c, > 1978d7) > 000002a100fa2ba1 zil_replay+0xa4(9050200ff00ff, 600243af880, > 600243af8b0, > 40000, 60022ad91d8, 6002347fc80) > 000002a100fa2c81 zfsvfs_setup+0x94(600243af880, 1, 18d1c00, > 600151e8400, > 18d0c00, 0) > 000002a100fa2d31 zfs_domount+0x2dc(60011f08d08, 60022afe480, > 60011f08d08, > 600243af890, 0, 400) > 000002a100fa2e11 zfs_mount+0x1ec(60011f08d08, 6002401e200, > 2a100fa39d8, 100, 0 > , 2) > 000002a100fa2f71 domount+0xaf0(100, 1, 6002401e200, 8077, > 60011f08d08, 0) > 000002a100fa3121 mount+0xec(60023dd7388, 2a100fa3ad8, 0, ff104ed8, > 100, 45bd0 > ) > 000002a100fa3221 syscall_ap+0x44(2a0, ffbfe8a8, 115b9e8, > 60023dd72d0, 15, 0) > 000002a100fa32e1 syscall_trap32+0xcc(45bd0, ffbfe8a8, 100, > ff104ed8, 0, 0) > > > zpool status and fmdump don''t indicate any problems. > > Any possibility to recover the dataset? I do have backups of all data, > but I would really like to be able to recover it to save some time. > > Anything special to look for in zdb output? Any other diagnostics that > would be useful? > > Thanks in advance! > > Best Regards //Magnus > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Tue, Dec 30, 2008 at 10:46 AM, Magnus Bergman <mb at citynetwork.se> wrote:> Hi again, > > No ideas? I have spent quite some time trying to recover, but no luck > yet. Any ideas or hints on recovery would be great! I''m soon running > out of time and will have to rebuild the zones and restore the data > but I''d much rather like to be able to recover it from the datasets. > > Since posting the initial question I have identified one more dataset > that has the same problem and can''t be mounted. Basically it''s the two > datasets with most disk activity. > > Regards //Magnus > > Begin forwarded message: > >> From: Magnus Bergman <mb at citynetwork.se> >> Date: December 28, 2008 18:11:44 GMT+01:00 >> To: zfs-discuss at opensolaris.org >> Subject: [zfs-discuss] zfs mount hangs >> >> Hi, >> >> System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and >> 2x146GB (space pool). snv_98. >> >> After a panic the system hangs on boot and manual attempts to mount >> (at least) one dataset in single user mode, hangs. >> >> The Panic: >> >> Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20: >> Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID >> 0x00167f73.1c737868 UE Error(s) >> Dec 27 04:42:11 base See previous message(s) for details >> Dec 27 04:42:11 base unix: [ID 100000 kern.notice] >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433efc0 >> SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10, >> 2a10433f207, 2a10433f208) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 000002a10433f0cb 00000000000f0000 00000000012ccc00 00000000012cd000 >> Dec 27 04:42:11 base %l4-7: 000002a10433f208 0000000000000170 >> 00000000012ccc00 0000000000000001 >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f210 >> SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe00000, 0, 180200000, 40, 0, >> a0b7ff60) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 000000000180c000 0000000000000000 000002a10433f3d4 >> Dec 27 04:42:11 base %l4-7: 00000000012cc400 000000007e600000 >> 00000000012cc400 0000000000000001 >> Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 000002a10433f410 >> unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 1800000) >> Dec 27 04:42:11 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000002 0000000000001400 0000000080001601 00000000012c1578 >> Dec 27 04:42:11 base %l4-7: 0000000ae394c629 0000060017a32260 >> 000000000000000b 000002a10433f4c0 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f560 >> unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16, >> 30001e4cc40) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 0000000000000000 00000180048279c0 000002a1035dbca0 >> Dec 27 04:42:12 base %l4-7: 0000000000000001 0000000001867800 >> 0000000025be86dc 00000000018bbc00 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f610 >> genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000c44002 00000000018d0e58 0000000000000001 0000000000c44002 >> Dec 27 04:42:12 base %l4-7: 0000000000000000 0000000000000001 >> 0000000000000002 0000000001326e5c >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f6c0 >> zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10, >> 30034dc1f48, 1) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 000006001cdcf7f0 000000000000ffff 0000000000000100 000000000000fc00 >> Dec 27 04:42:12 base %l4-7: 00000000018d7000 000000000c6eefd9 >> 000000000c6eefd8 000000000c6eefd8 >> Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 000002a10433f770 >> zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67, >> 0) >> Dec 27 04:42:12 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000060018b5d068 ffffffffffffffff 0000060010ce1040 000006001583be88 >> Dec 27 04:42:12 base %l4-7: 0000060013760380 00000000000000c0 >> 000003002bf81138 000003001365b778 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f820 >> zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c, >> 1b1a5ae5, 0) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000001 0000000000000001 00000600177fe080 000006001c1f2ad8 >> Dec 27 04:42:13 base %l4-7: 00000000000001c0 0000000000000001 >> 0000060010c78000 0000000000000000 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f8d0 >> zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 000003001e94d948 0000000000010000 000000000180c008 0000000000000008 >> Dec 27 04:42:13 base %l4-7: 0000060013760458 0000000000000000 >> 000000000134fc00 00000000018d2000 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433f980 >> genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200, >> 30028320b40) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000060011f6c828 0000000000000007 000006001c1f20c8 00000000013409d8 >> Dec 27 04:42:13 base %l4-7: 0000000000000000 0000000000000001 >> 0000000000000000 00000000018bcc00 >> Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 000002a10433fa30 >> genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40) >> Dec 27 04:42:13 base genunix: [ID 179002 kern.notice] %l0-3: >> 0000000000000000 000000000000f071 00000000f0710000 000000000000f071 >> Dec 27 04:42:13 base %l4-7: 0000000000000001 000000000180c000 >> 0000000000000000 0000000000000000 >> Dec 27 04:42:14 base unix: [ID 100000 kern.notice] >> Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file >> systems... >> Dec 27 04:42:14 base genunix: [ID 904073 kern.notice] done >> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning] >> WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at >> TL=0, errID 0x00167f73.1c737868 >> Dec 27 04:42:15 base AFSR 0x00000001<ME>.80200000<PRIV,UE> AFAR >> 0x00000000.a0b7ff60 >> Dec 27 04:42:15 base AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 >> Fault_PC 0x101b708 >> Dec 27 04:42:15 base UDBH 0x0203<UE> UDBH.ESYND 0x03 UDBL 0x0000 >> UDBL.ESYND 0x00 >> Dec 27 04:42:15 base UDBH Syndrome 0x3 Memory Module U1402 U0402 >> U1401 U0401 >> Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 325743 kern.warning] >> WARNING: [AFT1] errID 0x00167f73.1c737868 Syndrome 0x3 indicates that >> this may not be a memory module problem >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 151010 kern.info] [AFT2] >> errID 0x00167f73.1c737868 PA=0x00000000.a0b7ff60 >> Dec 27 04:42:16 base E$tag 0x00000000.1cc01416 E$State: >> Exclusive E >> $parity 0x0e >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x00): 0x0070ba48.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x08): 0x00000000.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x10): 0x00ec48b9.495349e1 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x18): 0x4955a237.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 989652 kern.info] >> [AFT2] E >> $Data (0x20): 0x00000800.00000000 *Bad* PSYND=0xff00 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x28): 0x0070ba28.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x30): 0x00000000.00000000 >> Dec 27 04:42:16 base SUNW,UltraSPARC-II: [ID 359263 kern.info] >> [AFT2] E >> $Data (0x38): 0x027a7ea6.494f4aeb >> Dec 27 04:47:56 base genunix: [ID 540533 kern.notice] ^MSunOS Release >> 5.11 Version snv_98 64-bit >> Dec 27 04:47:56 base genunix: [ID 172908 kern.notice] Copyright >> 1983-2008 Sun Microsystems, Inc. All rights reserved. >> Dec 27 04:47:56 base Use is subject to license terms. >> >> My guess would be a broken CPU, Maybe the old Ecache-problem... >> >> Anyway, "zfs mount space" works fine, but "zfs mount space/postfix" >> hangs. A look at the zfs-process shows: >> >> # echo "0t236::pid2proc|::walk thread|::findstack -v" | mdb -k >> stack pointer for thread 30001cecc00: 2a100fa2181 >> [ 000002a100fa2181 cv_wait+0x3c() ] >> 000002a100fa2231 txg_wait_open+0x58(60014aa1158, d000b, 0, >> 60014aa119c, >> 60014aa119e, 60014aa1150) >> 000002a100fa22e1 dmu_tx_assign+0x3c(60022dd3780, 1, 7, 60013cd5918, >> 5b, 1) >> 000002a100fa2391 dmu_free_long_range_impl+0xc4(600245fbdb0, >> 60025f69750, 0, >> 400, 0, 1) >> 000002a100fa2451 dmu_free_long_range+0x44(600245fbdb0, 43b12, 0, >> ffffffffffffffff, 1348800, 0) >> 000002a100fa2511 zfs_rmnode+0x68(60025bb6f20, 12, 600243af9e0, 1, >> 600243af880 >> , 600245fbdb0) >> 000002a100fa25d1 zfs_inactive+0x134(600243af988, 0, 60025f6fef8, >> 4000, 420, >> 60025bb6f20) >> 000002a100fa2681 zfs_rename+0x73c(6002401e400, 40800000004, >> 6002401e400, >> 60021860041, 60022dd3780, 60025bb6fe8) >> 000002a100fa27c1 fop_rename+0xac(6002401e400, 60021860030, >> 6002401e400, >> 60021860041, 60010c03e08, 0) >> 000002a100fa2881 zfs_replay_rename+0xb4(18bbc00, 6002400e8b0, 0, >> 60014a94000, >> 0, 0) >> 000002a100fa2951 zil_replay_log_record+0x244(18d1ed0, 60017108000, >> 2a100fa3450 >> , 0, 6002347fc80, 60014a94000) >> 000002a100fa2a41 zil_parse+0x160(58, 132573c, 13253a4, 2a100fa3450, >> cff2c, >> 1978d7) >> 000002a100fa2ba1 zil_replay+0xa4(9050200ff00ff, 600243af880, >> 600243af8b0, >> 40000, 60022ad91d8, 6002347fc80) >> 000002a100fa2c81 zfsvfs_setup+0x94(600243af880, 1, 18d1c00, >> 600151e8400, >> 18d0c00, 0) >> 000002a100fa2d31 zfs_domount+0x2dc(60011f08d08, 60022afe480, >> 60011f08d08, >> 600243af890, 0, 400) >> 000002a100fa2e11 zfs_mount+0x1ec(60011f08d08, 6002401e200, >> 2a100fa39d8, 100, 0 >> , 2) >> 000002a100fa2f71 domount+0xaf0(100, 1, 6002401e200, 8077, >> 60011f08d08, 0) >> 000002a100fa3121 mount+0xec(60023dd7388, 2a100fa3ad8, 0, ff104ed8, >> 100, 45bd0 >> ) >> 000002a100fa3221 syscall_ap+0x44(2a0, ffbfe8a8, 115b9e8, >> 60023dd72d0, 15, 0) >> 000002a100fa32e1 syscall_trap32+0xcc(45bd0, ffbfe8a8, 100, >> ff104ed8, 0, 0) >> >> >> zpool status and fmdump don''t indicate any problems. >> >> Any possibility to recover the dataset? I do have backups of all data, >> but I would really like to be able to recover it to save some time. >> >> Anything special to look for in zdb output? Any other diagnostics that >> would be useful? >> >> Thanks in advance! >> >> Best Regards //Magnus >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >I had a similar problem, but did not run truss to find the cause as it was not a live filesystem yet. Recreating the filesystem with the same name resulted in it not mounting and just hanging, but if I created it with a different name it would mount and run perfectly fine. I settled on the new name and continued on and have no noticed the problem again. But seeing this post, I''ll capture as much data as I can if it happens again. -- Brent Jones brent at servuhome.net