I have a raid-z zfs filesystem with 3 disks. The disk was starting have read and write errors. The disks was so bad that I started to have trans_err. The server lock up and the server was reset. Then now when trying to import the pool the system panic. I installed the last Recommend on my Solaris U3 and also install the last Kernel patch (120011-14). But still when trying to do zpool import <pool> it panic. I also dd the disk and tested on another server with OpenSolaris B72 and still the same thing. Here is the panic backtrace: Stack Backtrace ----------------- vpanic() assfail3+0xb9(fffffffff7dde5f0, 6, fffffffff7dde840, 0, fffffffff7dde820, 153) space_map_load+0x2ef(ffffff008f1290b8, ffffffffc00fc5b0, 1, ffffff008f128d88, ffffff008dd58ab0) metaslab_activate+0x66(ffffff008f128d80, 8000000000000000) metaslab_group_alloc+0x24e(ffffff008f46bcc0, 400, 3fd0f1, 32dc18000, ffffff008fbeaa80, 0) metaslab_alloc_dva+0x192(ffffff008f2d1a80, ffffff008f235730, 200, ffffff008fbeaa80, 0, 0) metaslab_alloc+0x82(ffffff008f2d1a80, ffffff008f235730, 200, ffffff008fbeaa80, 2 , 3fd0f1) zio_dva_allocate+0x68(ffffff008f722790) zio_next_stage+0xb3(ffffff008f722790) zio_checksum_generate+0x6e(ffffff008f722790) zio_next_stage+0xb3(ffffff008f722790) zio_write_compress+0x239(ffffff008f722790) zio_next_stage+0xb3(ffffff008f722790) zio_wait_for_children+0x5d(ffffff008f722790, 1, ffffff008f7229e0) zio_wait_children_ready+0x20(ffffff008f722790) zio_next_stage_async+0xbb(ffffff008f722790) zio_nowait+0x11(ffffff008f722790) dmu_objset_sync+0x196(ffffff008e4e5000, ffffff008f722a10, ffffff008f260a80) dsl_dataset_sync+0x5d(ffffff008df47e00, ffffff008f722a10, ffffff008f260a80) dsl_pool_sync+0xb5(ffffff00882fb800, 3fd0f1) spa_sync+0x1c5(ffffff008f2d1a80, 3fd0f1) txg_sync_thread+0x19a(ffffff00882fb800) thread_start+8() And here is the panic message buf: panic[cpu0]/thread=ffffff0001ba2c80: assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0 x6 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 339 ffffff0001ba24f0 genunix:assfail3+b9 () ffffff0001ba2590 zfs:space_map_load+2ef () ffffff0001ba25d0 zfs:metaslab_activate+66 () ffffff0001ba2690 zfs:metaslab_group_alloc+24e () ffffff0001ba2760 zfs:metaslab_alloc_dva+192 () ffffff0001ba2800 zfs:metaslab_alloc+82 () ffffff0001ba2850 zfs:zio_dva_allocate+68 () ffffff0001ba2870 zfs:zio_next_stage+b3 () ffffff0001ba28a0 zfs:zio_checksum_generate+6e () ffffff0001ba28c0 zfs:zio_next_stage+b3 () ffffff0001ba2930 zfs:zio_write_compress+239 () ffffff0001ba2950 zfs:zio_next_stage+b3 () ffffff0001ba29a0 zfs:zio_wait_for_children+5d () ffffff0001ba29c0 zfs:zio_wait_children_ready+20 () ffffff0001ba29e0 zfs:zio_next_stage_async+bb () ffffff0001ba2a00 zfs:zio_nowait+11 () ffffff0001ba2a80 zfs:dmu_objset_sync+196 () ffffff0001ba2ad0 zfs:dsl_dataset_sync+5d () ffffff0001ba2b40 zfs:dsl_pool_sync+b5 () ffffff0001ba2bd0 zfs:spa_sync+1c5 () ffffff0001ba2c60 zfs:txg_sync_thread+19a () ffffff0001ba2c70 unix:thread_start+8 () syncing file systems... Is there a way to restore the data? Is there a way to "fsck" the zpool, and correct the error manually? This message posted from opensolaris.org
Basically, it is complaining that there aren''t enough disks to read the pool metadata. This would suggest that in your 3-disk RAID-Z config, either two disks are missing, or one disk is missing *and* another disk is damaged -- due to prior failed writes, perhaps. (I know there''s at least one disk missing because the failure mode is errno 6, which is EXNIO.) Can you tell from /var/adm/messages or fmdump whether there write errors to multiple disks, or to just one? Jeff On Tue, Sep 18, 2007 at 05:26:16PM -0700, Geoffroy Doucet wrote:> I have a raid-z zfs filesystem with 3 disks. The disk was starting have read and write errors. > > The disks was so bad that I started to have trans_err. The server lock up and the server was reset. Then now when trying to import the pool the system panic. > > I installed the last Recommend on my Solaris U3 and also install the last Kernel patch (120011-14). > > But still when trying to do zpool import <pool> it panic. > > I also dd the disk and tested on another server with OpenSolaris B72 and still the same thing. Here is the panic backtrace: > > Stack Backtrace > ----------------- > vpanic() > assfail3+0xb9(fffffffff7dde5f0, 6, fffffffff7dde840, 0, fffffffff7dde820, 153) > space_map_load+0x2ef(ffffff008f1290b8, ffffffffc00fc5b0, 1, ffffff008f128d88, > ffffff008dd58ab0) > metaslab_activate+0x66(ffffff008f128d80, 8000000000000000) > metaslab_group_alloc+0x24e(ffffff008f46bcc0, 400, 3fd0f1, 32dc18000, > ffffff008fbeaa80, 0) > metaslab_alloc_dva+0x192(ffffff008f2d1a80, ffffff008f235730, 200, > ffffff008fbeaa80, 0, 0) > metaslab_alloc+0x82(ffffff008f2d1a80, ffffff008f235730, 200, ffffff008fbeaa80, 2 > , 3fd0f1) > zio_dva_allocate+0x68(ffffff008f722790) > zio_next_stage+0xb3(ffffff008f722790) > zio_checksum_generate+0x6e(ffffff008f722790) > zio_next_stage+0xb3(ffffff008f722790) > zio_write_compress+0x239(ffffff008f722790) > zio_next_stage+0xb3(ffffff008f722790) > zio_wait_for_children+0x5d(ffffff008f722790, 1, ffffff008f7229e0) > zio_wait_children_ready+0x20(ffffff008f722790) > zio_next_stage_async+0xbb(ffffff008f722790) > zio_nowait+0x11(ffffff008f722790) > dmu_objset_sync+0x196(ffffff008e4e5000, ffffff008f722a10, ffffff008f260a80) > dsl_dataset_sync+0x5d(ffffff008df47e00, ffffff008f722a10, ffffff008f260a80) > dsl_pool_sync+0xb5(ffffff00882fb800, 3fd0f1) > spa_sync+0x1c5(ffffff008f2d1a80, 3fd0f1) > txg_sync_thread+0x19a(ffffff00882fb800) > thread_start+8() > > > > And here is the panic message buf: > panic[cpu0]/thread=ffffff0001ba2c80: > assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0 > x6 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 339 > > > ffffff0001ba24f0 genunix:assfail3+b9 () > ffffff0001ba2590 zfs:space_map_load+2ef () > ffffff0001ba25d0 zfs:metaslab_activate+66 () > ffffff0001ba2690 zfs:metaslab_group_alloc+24e () > ffffff0001ba2760 zfs:metaslab_alloc_dva+192 () > ffffff0001ba2800 zfs:metaslab_alloc+82 () > ffffff0001ba2850 zfs:zio_dva_allocate+68 () > ffffff0001ba2870 zfs:zio_next_stage+b3 () > ffffff0001ba28a0 zfs:zio_checksum_generate+6e () > ffffff0001ba28c0 zfs:zio_next_stage+b3 () > ffffff0001ba2930 zfs:zio_write_compress+239 () > ffffff0001ba2950 zfs:zio_next_stage+b3 () > ffffff0001ba29a0 zfs:zio_wait_for_children+5d () > ffffff0001ba29c0 zfs:zio_wait_children_ready+20 () > ffffff0001ba29e0 zfs:zio_next_stage_async+bb () > ffffff0001ba2a00 zfs:zio_nowait+11 () > ffffff0001ba2a80 zfs:dmu_objset_sync+196 () > ffffff0001ba2ad0 zfs:dsl_dataset_sync+5d () > ffffff0001ba2b40 zfs:dsl_pool_sync+b5 () > ffffff0001ba2bd0 zfs:spa_sync+1c5 () > ffffff0001ba2c60 zfs:txg_sync_thread+19a () > ffffff0001ba2c70 unix:thread_start+8 () > > syncing file systems... > > > Is there a way to restore the data? Is there a way to "fsck" the zpool, and correct the error manually? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
actually here is the first panic messages: Sep 13 23:33:22 netra2 unix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 307 Sep 13 23:33:22 netra2 unix: [ID 100000 kern.notice] Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b000 genunix:assfail3+94 (7b7706d0, 5, 7b770710, 0, 7b770718, 133) Sep 13 23:33:22 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000002000 0000000000000133 0000000000000000 000000000186f800 Sep 13 23:33:22 netra2 %l4-7: 0000000000000000 000000000183d400 00000000011eb400 0000000000000000 Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b0c0 zfs:space_map_load+1a4 (30007cc2c38, 70450058, 1000, 30007cc2908, 380000000, 1) Sep 13 23:33:22 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000001a60 000003000ce3b000 0000000000000000 000000007b73ead0 Sep 13 23:33:22 netra2 %l4-7: 000000007b73e86c 00007fffffffffff 0000000000007fff 0000000000001000 Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b190 zfs:metaslab_activate+3c (30007cc2900, 8000000000000000, c000000000000000, e75efe6c, 30007cc2900, c0000000) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 000002a103e6b308 0000000000000003 0000000000000002 00000000006dd004 Sep 13 23:33:23 netra2 %l4-7: 0000000070450000 0000030010834940 00000300080eba40 00000300106c9748 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b240 zfs:metaslab_group_alloc+1bc (3fffffffffffffff, 400, 8000000000000000, 32dc18000, 30003387d88, ffffffffffffffff) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 00000300106c9750 0000000000000001 0000030007cc2900 Sep 13 23:33:23 netra2 %l4-7: 8000000000000000 0000000000000000 0000000196e0c000 4000000000000000 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b320 zfs:metaslab_alloc_dva+114 (0, 32dc18000, 30003387d88, 400, 300080eba40, 3fd0f1) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000000 0000000000000003 0000030011c068e0 Sep 13 23:33:23 netra2 %l4-7: 0000000000000000 00000300106c9748 0000000000000000 00000300106c9748 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b3f0 zfs:metaslab_alloc+2c (30010834940, 200, 30003387d88, 3, 3fd0f1, 0) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000030003387de8 00000300139e1800 00000000704506a0 0000000000000000 Sep 13 23:33:23 netra2 %l4-7: 0000030013fca7be 0000000000000000 0000030010834940 0000000000000001 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b4a0 zfs:zio_dva_allocate+4c (30010eafcc0, 7b7515a8, 30003387d88, 70450508, 70450400, 20001) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000070450400 0000070300000001 0000070300000001 0000000000000000 Sep 13 23:33:24 netra2 %l4-7: 0000000000000000 00000000018a5c00 0000000000000003 0000000000000007 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b550 zfs:zio_write_compress+1ec (30010eafcc0, 23e20b, 23e000, 10001, 3, 30003387d88) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 000000000000ffff 0000000000000000 0000000000000001 0000000000000200 Sep 13 23:33:24 netra2 %l4-7: 0000000000000000 0000000000010000 000000000000fc00 0000000000000001 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b620 zfs:zio_wait+c (30010eafcc0, 30010834940, 7, 30010eaff20, 3, 3fd0f1) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: ffffffffffffffff 000000007b7297d0 0000030003387d40 000003000be9edf8 Sep 13 23:33:24 netra2 %l4-7: 000002a103e6b7c0 0000000000000002 0000000000000002 000003000a799920 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b6d0 zfs:dmu_objset_sync+12c (30003387d40, 3000a762c80, 1, 1, 3000be9edf8, 0) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000030003387d88 ffffffffffffffff 0000000000000002 00000000003be93a Sep 13 23:33:24 netra2 %l4-7: 0000030003387e40 0000000000000020 0000030003387e20 0000030003387ea0 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b7e0 zfs:dsl_dataset_sync+c (30007609480, 3000a762c80, 30007609510, 30005c475b8, 30005c475b8, 30007609480) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000007 0000030005c47638 0000000000000001 Sep 13 23:33:25 netra2 %l4-7: 0000030007609508 0000000000000000 0000030005c4caa8 0000000000000000 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b890 zfs:dsl_pool_sync+64 (30005c47500, 3fd0f1, 30007609480, 3000f904380, 300032bb7c0, 300032bb7e8) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000030010834d00 000003000a762c80 0000030005c47698 Sep 13 23:33:25 netra2 %l4-7: 0000030005c47668 0000030005c47638 0000030005c475a8 0000030010eafcc0 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 000002a103e6b940 zfs:spa_sync+1b0 (30010834940, 3fd0f1, 0, 0, 2a103e6bcc4, 1) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: ffffffffffffffff 000000000180c000 0000030010834a28 000003000f904380 Sep 13 23:33:25 netra2 %l4-7: 0000000000000000 00000300080eb500 0000030005c47500 0000030010834ac0 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 000002a103e6ba00 zfs:txg_sync_thread+134 (30005c47500, 3fd0f1, 0, 2a103e6bab0, 30005c47610, 30005c47612) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0000030005c47620 0000030005c475d0 0000000000000000 0000030005c475d8 Sep 13 23:33:25 netra2 %l4-7: 0000030005c47616 0000030005c47614 0000030005c475c8 00000000003fd0f2 Sep 13 23:33:26 netra2 unix: [ID 100000 kern.notice] So the 0x05 is for EIO i/o error. But now I got 0x06. And Now I dd the disks to another sever (actually a vmware) to reproduce the problem and free the production server. And I got the 0x06. And it looks like the the pool is corrupt. I will read the document ondiskformat0822.pdf to get a better knowledge. This message posted from opensolaris.org
Ok I found the problem with 0x06, one disk was missing. But now I got all my disk and I get 0x05.: Sep 21 10:25:53 unknown ^Mpanic[cpu0]/thread=ffffff0001e12c80: Sep 21 10:25:53 unknown genunix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: .. /../common/fs/zfs/space_map.c, line: 339 Sep 21 10:25:53 unknown unix: [ID 100000 kern.notice] Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e124f0 genunix:assfail3+b9 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12590 zfs:space_map_load+2ef () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e125d0 zfs:metaslab_activate+66 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12690 zfs:metaslab_group_alloc+24e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12760 zfs:metaslab_alloc_dva+192 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12800 zfs:metaslab_alloc+82 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12850 zfs:zio_dva_allocate+68 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12870 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e128a0 zfs:zio_checksum_generate+6e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e128c0 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12930 zfs:zio_write_compress+239 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12950 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e129a0 zfs:zio_wait_for_children+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e129c0 zfs:zio_wait_children_ready+20 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e129e0 zfs:zio_next_stage_async+bb () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12a00 zfs:zio_nowait+11 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12a80 zfs:dmu_objset_sync+196 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12ad0 zfs:dsl_dataset_sync+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12b40 zfs:dsl_pool_sync+b5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12bd0 zfs:spa_sync+1c5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12c60 zfs:txg_sync_thread+19a () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ffffff0001e12c70 unix:thread_start+8 () There is no scsi the disks, because those are virtual disk. Also for anyone who are interest, I wrote a little program to show the properties on the vdev.: http://www.projectvolcano.org/zfs/list_vdev.c. Here is a sample output: bash-3.00# ./list_vdev -d /dev/dsk/c1t12d0s0 Vdev properties for /dev/dsk/c1t12d0s0: version: 0x0000000000000003 name: share02 state: 0x0000000000000001 txg: 0x00000000003fd0e4 pool_guid: 0x88f93fc54c215cfa top_guid: 0x65400f2e7db0c2a5 guid: 0xfc3b9af2d3b6fd46 vdev_tree: type: raidz id: 0x0000000000000000 guid: 0x65400f2e7db0c2a5 nparity: 0x0000000000000001 metaslab_array: 0x000000000000000d metaslab_shift: 0x000000000000001e ashift: 0x0000000000000009 asize: 0x000000196e0c0000 children: [ [0] type: disk id: 0x0000000000000000 guid: 0xfc3b9af2d3b6fd46 path: /dev/dsk/c1t12d0s0 devid: id1,sd at SFUJITSU_MAF3364L_SUN36G_00665620____/a whole_disk: 0x0000000000000001 DTL: 0x000000000000004e [1] type: disk id: 0x0000000000000001 guid: 0x377cc1a2beb3c985 path: /dev/dsk/c1t13d0s0 devid: id1,sd at SFUJITSU_MAF3364L_SUN36G_00666490____/a whole_disk: 0x0000000000000001 DTL: 0x000000000000004d [2] type: disk id: 0x0000000000000002 guid: 0xe97db62ad7fe325d path: /dev/dsk/c1t14d0s0 devid: id1,sd at SFUJITSU_MAF3364L_SUN36G_00666674____/a whole_disk: 0x0000000000000001 DTL: 0x0000000000000091 ] So my question, is there a way to really know why I got IOE (0x05)? Is there a way to know in the debugger? How can I access it? This message posted from opensolaris.org