Hi, Trying to understand the zfs code, I was playing with libzpool - like ztest does. What I am trying to do is create an object in a zfs filesystem. But I am seeing failures when I try to sync changes. bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P DVA[0]=<0:0:200> fletcher2 uncompressed LE contiguous birth=7 fill=0 cksum=4141414141414141:4141:2828282828282820:82820): error 28 Abort (core dumped) I can see that space does not get allocated and sync results in ENOSPC. There is plenty of space in the pool. So, that is not the issue. I guess I am missing the step of allocating space. Could someone help me figure out what it is? I have tried to follow what ztest does. This is what I do: <code snippet> kernel_init(FREAD | FWRITE); buf = malloc(BUF_SIZE); memset(buf, ''A'', BUF_SIZE); error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os); if (error) { fprintf(stderr, "dmu_objset_open() = %d", error); return (error); } tx = dmu_tx_create(os); dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE); // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { dmu_tx_abort(tx); return (error); } object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0, DMU_OT_NONE, 0, tx); printf("Object %lld\n", object); dmu_write(os, object, 0, BUF_SIZE, buf, tx); dmu_tx_commit(tx); txg_wait_synced(dmu_objset_pool(os), 0); </code snippet> It is interesting that the checksum that is reported is the pattern that I try to write. This is the panic stack: d11d8e65 _lwp_kill (98, 6) + 15 d1192102 raise (6) + 22 d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, ce3fdcbc) + cd d131ed79 vpanic (d1341dbc, ce3fdcc8) + 51 d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f d131921d zio_done (81f5d00) + 455 d131c15d zio_next_stage (81f5d00) + 161 d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a d1318c88 zio_wait_children_done (81f5d00) + 18 d131c15d zio_next_stage (81f5d00) + 161 d131ba83 zio_vdev_io_assess (81f5d00) + 183 d131c15d zio_next_stage (81f5d00) + 161 d1307011 vdev_mirror_io_done (81f5d00) + 421 d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, d11ba2df, 3) + 36 d131f585 taskq_thread (81809c0) + 89 d11d7604 _thr_setup (d0e0ac00) + 52 d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0) Thanks in advance! Regards, Manoj
Hi, I tried adding an spa_export();spa_import() to the code snippet. I get a similar crash while importing. I/O failure (write on <unknown> off 0: zio 822ed40 [L0 unallocated] 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb LE contiguous birth=4116 fill=0 cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error 28 Abort (core dumped) I thought ztest could use an existing pool. Is that assumption wrong? These are the stacks of interest. d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19 d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69 d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24 d131e4d2 cv_wait (81c3e0c, 81c3d6c) + 5e d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179 d12f9080 spa_config_update (819dac0, 0) + c4 d12f467a spa_import (8047657, 8181f88, 0) + 256 080510c6 main (2, 804749c, 80474a8) + b2 08050f22 _start (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a d131ed79 vpanic (d1341dbc, ca5cd248) + 51 d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f d131921d zio_done (822ed40) + 455 d131c15d zio_next_stage (822ed40) + 161 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a d1318c88 zio_wait_children_done (822ed40) + 18 d131c15d zio_next_stage (822ed40) + 161 d131ba83 zio_vdev_io_assess (822ed40) + 183 d131c15d zio_next_stage (822ed40) + 161 d1307011 vdev_mirror_io_done (822ed40) + 421 d131b8a2 zio_vdev_io_done (822ed40) + 36 d131c15d zio_next_stage (822ed40) + 161 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a d1318c88 zio_wait_children_done (822ed40) + 18 d1306be6 vdev_mirror_io_start (822ed40) + 1d2 d131b862 zio_vdev_io_start (822ed40) + 34e d131c313 zio_next_stage_async (822ed40) + 1ab d131bb47 zio_vdev_io_assess (822ed40) + 247 d131c15d zio_next_stage (822ed40) + 161 d1307011 vdev_mirror_io_done (822ed40) + 421 d131b8a2 zio_vdev_io_done (822ed40) + 36 d131c15d zio_next_stage (822ed40) + 161 d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a d1318c88 zio_wait_children_done (822ed40) + 18 d1306be6 vdev_mirror_io_start (822ed40) + 1d2 d131b862 zio_vdev_io_start (822ed40) + 34e d131c15d zio_next_stage (822ed40) + 161 d1318dc1 zio_ready (822ed40) + 131 d131c15d zio_next_stage (822ed40) + 161 d131b41b zio_dva_allocate (822ed40) + 343 d131c15d zio_next_stage (822ed40) + 161 d131bdcb zio_checksum_generate (822ed40) + 123 d131c15d zio_next_stage (822ed40) + 161 d1319873 zio_write_compress (822ed40) + 4af d131c15d zio_next_stage (822ed40) + 161 d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a d1318c68 zio_wait_children_ready (822ed40) + 18 d131c313 zio_next_stage_async (822ed40) + 1ab d1318b1f zio_nowait (822ed40) + 1b d12c6941 arc_write (82490c0, 819dac0, 7, 3, 2, 1014) + 1ed d12ce7ae dbuf_sync (82bd008, 82490c0, 82beb40) + e6e d12e2ecb dnode_sync (82ea090, 0, 82490c0, 82beb40) + 517 d12d663a dmu_objset_sync_dnodes (82a6e00, 82a6ee4, 82beb40) + 14e d12d6983 dmu_objset_sync (82a6e00, 82beb40) + 137 d12ec20e dsl_pool_sync (81c3cc0, 1014, 0) + 182 d12f7db6 spa_sync (819dac0, 1014, 0) + 26e d12fdf14 txg_sync_thread (81c3cc0) + 2a8 d11d7604 _thr_setup (ccdda400) + 52 d11d7860 _lwp_start (ccdda400, 0, 0, 0, 0, 0) Regards, Manoj Manoj Joseph wrote:> Hi, > > Trying to understand the zfs code, I was playing with libzpool - like > ztest does. > > What I am trying to do is create an object in a zfs filesystem. But I am > seeing failures when I try to sync changes. > > bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on > <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P DVA[0]=<0:0:200> > fletcher2 uncompressed LE contiguous birth=7 fill=0 > cksum=4141414141414141:4141:2828282828282820:82820): error 28 > Abort (core dumped) > > I can see that space does not get allocated and sync results in ENOSPC. > There is plenty of space in the pool. So, that is not the issue. I guess > I am missing the step of allocating space. Could someone help me figure > out what it is? > > I have tried to follow what ztest does. This is what I do: > > <code snippet> > kernel_init(FREAD | FWRITE); > buf = malloc(BUF_SIZE); > memset(buf, ''A'', BUF_SIZE); > > error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os); > if (error) { > fprintf(stderr, "dmu_objset_open() = %d", error); > return (error); > } > tx = dmu_tx_create(os); > dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE); > // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT); > > error = dmu_tx_assign(tx, TXG_WAIT); > if (error) { > dmu_tx_abort(tx); > return (error); > } > > object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0, > DMU_OT_NONE, 0, tx); > > printf("Object %lld\n", object); > > dmu_write(os, object, 0, BUF_SIZE, buf, tx); > dmu_tx_commit(tx); > > txg_wait_synced(dmu_objset_pool(os), 0); > </code snippet> > > It is interesting that the checksum that is reported is the pattern that > I try to write. > > This is the panic stack: > d11d8e65 _lwp_kill (98, 6) + 15 > d1192102 raise (6) + 22 > d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, > ce3fdcbc) + cd > d131ed79 vpanic (d1341dbc, ce3fdcc8) + 51 > d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f > d131921d zio_done (81f5d00) + 455 > d131c15d zio_next_stage (81f5d00) + 161 > d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a > d1318c88 zio_wait_children_done (81f5d00) + 18 > d131c15d zio_next_stage (81f5d00) + 161 > d131ba83 zio_vdev_io_assess (81f5d00) + 183 > d131c15d zio_next_stage (81f5d00) + 161 > d1307011 vdev_mirror_io_done (81f5d00) + 421 > d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, d11ba2df, 3) > + 36 > d131f585 taskq_thread (81809c0) + 89 > d11d7604 _thr_setup (d0e0ac00) + 52 > d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0) > > Thanks in advance! > > Regards, > Manoj
Hi, Replying to myself again. :) I see this problem only if I attempt to use a zpool that already exists. If I create one (using files instead of devices, don''t know if it matters) like ztest does, it works like a charm. Any clue as to why this is so would be appreciated. Cheers Manoj Manoj Joseph wrote:> Hi, > > I tried adding an spa_export();spa_import() to the code snippet. I get a > similar crash while importing. > > I/O failure (write on <unknown> off 0: zio 822ed40 [L0 unallocated] > 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb LE > contiguous birth=4116 fill=0 > cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error 28 > Abort (core dumped) > > I thought ztest could use an existing pool. Is that assumption wrong? > > These are the stacks of interest. > > d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19 > d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e > d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69 > d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24 > d131e4d2 cv_wait (81c3e0c, 81c3d6c) + 5e > d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179 > d12f9080 spa_config_update (819dac0, 0) + c4 > d12f467a spa_import (8047657, 8181f88, 0) + 256 > 080510c6 main (2, 804749c, 80474a8) + b2 > 08050f22 _start (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a > > > d131ed79 vpanic (d1341dbc, ca5cd248) + 51 > d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f > d131921d zio_done (822ed40) + 455 > d131c15d zio_next_stage (822ed40) + 161 > d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a > d1318c88 zio_wait_children_done (822ed40) + 18 > d131c15d zio_next_stage (822ed40) + 161 > d131ba83 zio_vdev_io_assess (822ed40) + 183 > d131c15d zio_next_stage (822ed40) + 161 > d1307011 vdev_mirror_io_done (822ed40) + 421 > d131b8a2 zio_vdev_io_done (822ed40) + 36 > d131c15d zio_next_stage (822ed40) + 161 > d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a > d1318c88 zio_wait_children_done (822ed40) + 18 > d1306be6 vdev_mirror_io_start (822ed40) + 1d2 > d131b862 zio_vdev_io_start (822ed40) + 34e > d131c313 zio_next_stage_async (822ed40) + 1ab > d131bb47 zio_vdev_io_assess (822ed40) + 247 > d131c15d zio_next_stage (822ed40) + 161 > d1307011 vdev_mirror_io_done (822ed40) + 421 > d131b8a2 zio_vdev_io_done (822ed40) + 36 > d131c15d zio_next_stage (822ed40) + 161 > d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a > d1318c88 zio_wait_children_done (822ed40) + 18 > d1306be6 vdev_mirror_io_start (822ed40) + 1d2 > d131b862 zio_vdev_io_start (822ed40) + 34e > d131c15d zio_next_stage (822ed40) + 161 > d1318dc1 zio_ready (822ed40) + 131 > d131c15d zio_next_stage (822ed40) + 161 > d131b41b zio_dva_allocate (822ed40) + 343 > d131c15d zio_next_stage (822ed40) + 161 > d131bdcb zio_checksum_generate (822ed40) + 123 > d131c15d zio_next_stage (822ed40) + 161 > d1319873 zio_write_compress (822ed40) + 4af > d131c15d zio_next_stage (822ed40) + 161 > d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a > d1318c68 zio_wait_children_ready (822ed40) + 18 > d131c313 zio_next_stage_async (822ed40) + 1ab > d1318b1f zio_nowait (822ed40) + 1b > d12c6941 arc_write (82490c0, 819dac0, 7, 3, 2, 1014) + 1ed > d12ce7ae dbuf_sync (82bd008, 82490c0, 82beb40) + e6e > d12e2ecb dnode_sync (82ea090, 0, 82490c0, 82beb40) + 517 > d12d663a dmu_objset_sync_dnodes (82a6e00, 82a6ee4, 82beb40) + 14e > d12d6983 dmu_objset_sync (82a6e00, 82beb40) + 137 > d12ec20e dsl_pool_sync (81c3cc0, 1014, 0) + 182 > d12f7db6 spa_sync (819dac0, 1014, 0) + 26e > d12fdf14 txg_sync_thread (81c3cc0) + 2a8 > d11d7604 _thr_setup (ccdda400) + 52 > d11d7860 _lwp_start (ccdda400, 0, 0, 0, 0, 0) > > Regards, > Manoj > > Manoj Joseph wrote: >> Hi, >> >> Trying to understand the zfs code, I was playing with libzpool - like >> ztest does. >> >> What I am trying to do is create an object in a zfs filesystem. But I >> am seeing failures when I try to sync changes. >> >> bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on >> <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P >> DVA[0]=<0:0:200> fletcher2 uncompressed LE contiguous birth=7 fill=0 >> cksum=4141414141414141:4141:2828282828282820:82820): error 28 >> Abort (core dumped) >> >> I can see that space does not get allocated and sync results in >> ENOSPC. There is plenty of space in the pool. So, that is not the >> issue. I guess I am missing the step of allocating space. Could >> someone help me figure out what it is? >> >> I have tried to follow what ztest does. This is what I do: >> >> <code snippet> >> kernel_init(FREAD | FWRITE); >> buf = malloc(BUF_SIZE); >> memset(buf, ''A'', BUF_SIZE); >> >> error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os); >> if (error) { >> fprintf(stderr, "dmu_objset_open() = %d", error); >> return (error); >> } >> tx = dmu_tx_create(os); >> dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE); >> // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT); >> >> error = dmu_tx_assign(tx, TXG_WAIT); >> if (error) { >> dmu_tx_abort(tx); >> return (error); >> } >> >> object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0, >> DMU_OT_NONE, 0, tx); >> >> printf("Object %lld\n", object); >> >> dmu_write(os, object, 0, BUF_SIZE, buf, tx); >> dmu_tx_commit(tx); >> >> txg_wait_synced(dmu_objset_pool(os), 0); >> </code snippet> >> >> It is interesting that the checksum that is reported is the pattern >> that I try to write. >> >> This is the panic stack: >> d11d8e65 _lwp_kill (98, 6) + 15 >> d1192102 raise (6) + 22 >> d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, >> ce3fdcbc) + cd >> d131ed79 vpanic (d1341dbc, ce3fdcc8) + 51 >> d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f >> d131921d zio_done (81f5d00) + 455 >> d131c15d zio_next_stage (81f5d00) + 161 >> d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a >> d1318c88 zio_wait_children_done (81f5d00) + 18 >> d131c15d zio_next_stage (81f5d00) + 161 >> d131ba83 zio_vdev_io_assess (81f5d00) + 183 >> d131c15d zio_next_stage (81f5d00) + 161 >> d1307011 vdev_mirror_io_done (81f5d00) + 421 >> d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, d11ba2df, >> 3) + 36 >> d131f585 taskq_thread (81809c0) + 89 >> d11d7604 _thr_setup (d0e0ac00) + 52 >> d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0) >> >> Thanks in advance! >> >> Regards, >> Manoj
Manoj Joseph wrote:> Hi, > > Replying to myself again. :) > > I see this problem only if I attempt to use a zpool that already exists. > If I create one (using files instead of devices, don''t know if it > matters) like ztest does, it works like a charm.You should probably be posting on zfs-discuss. The pool you''re trying to access is damaged. It would appear that one of the devices can not be written to. --matt> > Any clue as to why this is so would be appreciated. > > Cheers > Manoj > > Manoj Joseph wrote: >> Hi, >> >> I tried adding an spa_export();spa_import() to the code snippet. I get >> a similar crash while importing. >> >> I/O failure (write on <unknown> off 0: zio 822ed40 [L0 unallocated] >> 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb >> LE contiguous birth=4116 fill=0 >> cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error 28 >> Abort (core dumped) >> >> I thought ztest could use an existing pool. Is that assumption wrong? >> >> These are the stacks of interest. >> >> d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19 >> d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e >> d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69 >> d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24 >> d131e4d2 cv_wait (81c3e0c, 81c3d6c) + 5e >> d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179 >> d12f9080 spa_config_update (819dac0, 0) + c4 >> d12f467a spa_import (8047657, 8181f88, 0) + 256 >> 080510c6 main (2, 804749c, 80474a8) + b2 >> 08050f22 _start (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a >> >> >> d131ed79 vpanic (d1341dbc, ca5cd248) + 51 >> d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f >> d131921d zio_done (822ed40) + 455 >> d131c15d zio_next_stage (822ed40) + 161 >> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >> d1318c88 zio_wait_children_done (822ed40) + 18 >> d131c15d zio_next_stage (822ed40) + 161 >> d131ba83 zio_vdev_io_assess (822ed40) + 183 >> d131c15d zio_next_stage (822ed40) + 161 >> d1307011 vdev_mirror_io_done (822ed40) + 421 >> d131b8a2 zio_vdev_io_done (822ed40) + 36 >> d131c15d zio_next_stage (822ed40) + 161 >> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >> d1318c88 zio_wait_children_done (822ed40) + 18 >> d1306be6 vdev_mirror_io_start (822ed40) + 1d2 >> d131b862 zio_vdev_io_start (822ed40) + 34e >> d131c313 zio_next_stage_async (822ed40) + 1ab >> d131bb47 zio_vdev_io_assess (822ed40) + 247 >> d131c15d zio_next_stage (822ed40) + 161 >> d1307011 vdev_mirror_io_done (822ed40) + 421 >> d131b8a2 zio_vdev_io_done (822ed40) + 36 >> d131c15d zio_next_stage (822ed40) + 161 >> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >> d1318c88 zio_wait_children_done (822ed40) + 18 >> d1306be6 vdev_mirror_io_start (822ed40) + 1d2 >> d131b862 zio_vdev_io_start (822ed40) + 34e >> d131c15d zio_next_stage (822ed40) + 161 >> d1318dc1 zio_ready (822ed40) + 131 >> d131c15d zio_next_stage (822ed40) + 161 >> d131b41b zio_dva_allocate (822ed40) + 343 >> d131c15d zio_next_stage (822ed40) + 161 >> d131bdcb zio_checksum_generate (822ed40) + 123 >> d131c15d zio_next_stage (822ed40) + 161 >> d1319873 zio_write_compress (822ed40) + 4af >> d131c15d zio_next_stage (822ed40) + 161 >> d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a >> d1318c68 zio_wait_children_ready (822ed40) + 18 >> d131c313 zio_next_stage_async (822ed40) + 1ab >> d1318b1f zio_nowait (822ed40) + 1b >> d12c6941 arc_write (82490c0, 819dac0, 7, 3, 2, 1014) + 1ed >> d12ce7ae dbuf_sync (82bd008, 82490c0, 82beb40) + e6e >> d12e2ecb dnode_sync (82ea090, 0, 82490c0, 82beb40) + 517 >> d12d663a dmu_objset_sync_dnodes (82a6e00, 82a6ee4, 82beb40) + 14e >> d12d6983 dmu_objset_sync (82a6e00, 82beb40) + 137 >> d12ec20e dsl_pool_sync (81c3cc0, 1014, 0) + 182 >> d12f7db6 spa_sync (819dac0, 1014, 0) + 26e >> d12fdf14 txg_sync_thread (81c3cc0) + 2a8 >> d11d7604 _thr_setup (ccdda400) + 52 >> d11d7860 _lwp_start (ccdda400, 0, 0, 0, 0, 0) >> >> Regards, >> Manoj >> >> Manoj Joseph wrote: >>> Hi, >>> >>> Trying to understand the zfs code, I was playing with libzpool - like >>> ztest does. >>> >>> What I am trying to do is create an object in a zfs filesystem. But I >>> am seeing failures when I try to sync changes. >>> >>> bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on >>> <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P >>> DVA[0]=<0:0:200> fletcher2 uncompressed LE contiguous birth=7 fill=0 >>> cksum=4141414141414141:4141:2828282828282820:82820): error 28 >>> Abort (core dumped) >>> >>> I can see that space does not get allocated and sync results in >>> ENOSPC. There is plenty of space in the pool. So, that is not the >>> issue. I guess I am missing the step of allocating space. Could >>> someone help me figure out what it is? >>> >>> I have tried to follow what ztest does. This is what I do: >>> >>> <code snippet> >>> kernel_init(FREAD | FWRITE); >>> buf = malloc(BUF_SIZE); >>> memset(buf, ''A'', BUF_SIZE); >>> >>> error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os); >>> if (error) { >>> fprintf(stderr, "dmu_objset_open() = %d", error); >>> return (error); >>> } >>> tx = dmu_tx_create(os); >>> dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE); >>> // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT); >>> >>> error = dmu_tx_assign(tx, TXG_WAIT); >>> if (error) { >>> dmu_tx_abort(tx); >>> return (error); >>> } >>> >>> object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0, >>> DMU_OT_NONE, 0, tx); >>> >>> printf("Object %lld\n", object); >>> >>> dmu_write(os, object, 0, BUF_SIZE, buf, tx); >>> dmu_tx_commit(tx); >>> >>> txg_wait_synced(dmu_objset_pool(os), 0); >>> </code snippet> >>> >>> It is interesting that the checksum that is reported is the pattern >>> that I try to write. >>> >>> This is the panic stack: >>> d11d8e65 _lwp_kill (98, 6) + 15 >>> d1192102 raise (6) + 22 >>> d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, >>> ce3fdcbc) + cd >>> d131ed79 vpanic (d1341dbc, ce3fdcc8) + 51 >>> d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f >>> d131921d zio_done (81f5d00) + 455 >>> d131c15d zio_next_stage (81f5d00) + 161 >>> d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a >>> d1318c88 zio_wait_children_done (81f5d00) + 18 >>> d131c15d zio_next_stage (81f5d00) + 161 >>> d131ba83 zio_vdev_io_assess (81f5d00) + 183 >>> d131c15d zio_next_stage (81f5d00) + 161 >>> d1307011 vdev_mirror_io_done (81f5d00) + 421 >>> d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, d11ba2df, >>> 3) + 36 >>> d131f585 taskq_thread (81809c0) + 89 >>> d11d7604 _thr_setup (d0e0ac00) + 52 >>> d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0) >>> >>> Thanks in advance! >>> >>> Regards, >>> Manoj > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://opensolaris.org/mailman/listinfo/zfs-code
Hi, In brief, what I am trying to do is to use libzpool to access a zpool - like ztest does. Matthew Ahrens wrote:> Manoj Joseph wrote: >> Hi, >> >> Replying to myself again. :) >> >> I see this problem only if I attempt to use a zpool that already >> exists. If I create one (using files instead of devices, don''t know if >> it matters) like ztest does, it works like a charm. > > You should probably be posting on zfs-discuss.Switching from zfs-code to zfs-discuss.> The pool you''re trying to access is damaged. It would appear that one > of the devices can not be written to.No, AFAIK, the pool is not damaged. But yes, it looks like the device can''t be written to by the userland zfs. bash-3.00# zpool import test bash-3.00# zfs list test NAME USED AVAIL REFER MOUNTPOINT test 85K 1.95G 24.5K /test bash-3.00# ./udmu test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 errors: No known data errors Export the pool. cannot open ''test'': no such pool Import the pool. error: ZFS: I/O failure (write on <unknown> off 0: zio 8265d80 [L0 unallocated] 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb LE contiguous birth=245 fill=0 cksum=6bba8d3a44:2cfa96558ac7:c732e55bea858:2b86470f6a83373): error 28 Abort (core dumped) bash-3.00# zpool import test bash-3.00# zpool status test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# touch /test/z bash-3.00# sync bash-3.00# ls -l /test/z -rw-r--r-- 1 root root 0 Jun 28 04:18 /test/z bash-3.00# The userland zfs''s export succeeds. But doing a system("zpool status test") right after the spa_export() succeeds shows that the the ''kernel zfs'' still thinks it is imported. I guess that makes sense. Nothing has been told to the ''kernel zfs'' about the export. But I still do not understand why the ''userland zfs'' can''t write to the pool. Regards, Manoj PS: The code I have be tinkering with is attached.> > --matt > >> >> Any clue as to why this is so would be appreciated. >> >> Cheers >> Manoj >> >> Manoj Joseph wrote: >>> Hi, >>> >>> I tried adding an spa_export();spa_import() to the code snippet. I >>> get a similar crash while importing. >>> >>> I/O failure (write on <unknown> off 0: zio 822ed40 [L0 unallocated] >>> 4000L/400P DVA[0]=<0:1000:400> DVA[1]=<0:18001000:400> fletcher4 lzjb >>> LE contiguous birth=4116 fill=0 >>> cksum=69c3a4acfc:2c42fdcaced5:c5231ffcb2285:2b8c1a5f2cb2bfd): error >>> 28 Abort (core dumped) >>> >>> I thought ztest could use an existing pool. Is that assumption wrong? >>> >>> These are the stacks of interest. >>> >>> d11d78b9 __lwp_park (81c3e0c, 81c3d70, 0) + 19 >>> d11d1ad2 cond_wait_queue (81c3e0c, 81c3d70, 0, 0) + 3e >>> d11d1fbd _cond_wait (81c3e0c, 81c3d70) + 69 >>> d11d1ffb cond_wait (81c3e0c, 81c3d70) + 24 >>> d131e4d2 cv_wait (81c3e0c, 81c3d6c) + 5e >>> d12fe2dd txg_wait_synced (81c3cc0, 1014, 0) + 179 >>> d12f9080 spa_config_update (819dac0, 0) + c4 >>> d12f467a spa_import (8047657, 8181f88, 0) + 256 >>> 080510c6 main (2, 804749c, 80474a8) + b2 >>> 08050f22 _start (2, 8047650, 8047657, 0, 804765c, 8047678) + 7a >>> >>> >>> d131ed79 vpanic (d1341dbc, ca5cd248) + 51 >>> d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f >>> d131921d zio_done (822ed40) + 455 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >>> d1318c88 zio_wait_children_done (822ed40) + 18 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d131ba83 zio_vdev_io_assess (822ed40) + 183 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1307011 vdev_mirror_io_done (822ed40) + 421 >>> d131b8a2 zio_vdev_io_done (822ed40) + 36 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >>> d1318c88 zio_wait_children_done (822ed40) + 18 >>> d1306be6 vdev_mirror_io_start (822ed40) + 1d2 >>> d131b862 zio_vdev_io_start (822ed40) + 34e >>> d131c313 zio_next_stage_async (822ed40) + 1ab >>> d131bb47 zio_vdev_io_assess (822ed40) + 247 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1307011 vdev_mirror_io_done (822ed40) + 421 >>> d131b8a2 zio_vdev_io_done (822ed40) + 36 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1318b92 zio_wait_for_children (822ed40, 11, 822ef30) + 6a >>> d1318c88 zio_wait_children_done (822ed40) + 18 >>> d1306be6 vdev_mirror_io_start (822ed40) + 1d2 >>> d131b862 zio_vdev_io_start (822ed40) + 34e >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1318dc1 zio_ready (822ed40) + 131 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d131b41b zio_dva_allocate (822ed40) + 343 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d131bdcb zio_checksum_generate (822ed40) + 123 >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1319873 zio_write_compress (822ed40) + 4af >>> d131c15d zio_next_stage (822ed40) + 161 >>> d1318b92 zio_wait_for_children (822ed40, 1, 822ef28) + 6a >>> d1318c68 zio_wait_children_ready (822ed40) + 18 >>> d131c313 zio_next_stage_async (822ed40) + 1ab >>> d1318b1f zio_nowait (822ed40) + 1b >>> d12c6941 arc_write (82490c0, 819dac0, 7, 3, 2, 1014) + 1ed >>> d12ce7ae dbuf_sync (82bd008, 82490c0, 82beb40) + e6e >>> d12e2ecb dnode_sync (82ea090, 0, 82490c0, 82beb40) + 517 >>> d12d663a dmu_objset_sync_dnodes (82a6e00, 82a6ee4, 82beb40) + 14e >>> d12d6983 dmu_objset_sync (82a6e00, 82beb40) + 137 >>> d12ec20e dsl_pool_sync (81c3cc0, 1014, 0) + 182 >>> d12f7db6 spa_sync (819dac0, 1014, 0) + 26e >>> d12fdf14 txg_sync_thread (81c3cc0) + 2a8 >>> d11d7604 _thr_setup (ccdda400) + 52 >>> d11d7860 _lwp_start (ccdda400, 0, 0, 0, 0, 0) >>> >>> Regards, >>> Manoj >>> >>> Manoj Joseph wrote: >>>> Hi, >>>> >>>> Trying to understand the zfs code, I was playing with libzpool - >>>> like ztest does. >>>> >>>> What I am trying to do is create an object in a zfs filesystem. But >>>> I am seeing failures when I try to sync changes. >>>> >>>> bash-3.00# ./udmu test Object 4 error: ZFS: I/O failure (write on >>>> <unknown> off 0: zio 81f5d00 [L0 unallocated] 200L/200P >>>> DVA[0]=<0:0:200> fletcher2 uncompressed LE contiguous birth=7 fill=0 >>>> cksum=4141414141414141:4141:2828282828282820:82820): error 28 >>>> Abort (core dumped) >>>> >>>> I can see that space does not get allocated and sync results in >>>> ENOSPC. There is plenty of space in the pool. So, that is not the >>>> issue. I guess I am missing the step of allocating space. Could >>>> someone help me figure out what it is? >>>> >>>> I have tried to follow what ztest does. This is what I do: >>>> >>>> <code snippet> >>>> kernel_init(FREAD | FWRITE); >>>> buf = malloc(BUF_SIZE); >>>> memset(buf, ''A'', BUF_SIZE); >>>> >>>> error = dmu_objset_open(osname, DMU_OST_ZFS, DS_MODE_PRIMARY, &os); >>>> if (error) { >>>> fprintf(stderr, "dmu_objset_open() = %d", error); >>>> return (error); >>>> } >>>> tx = dmu_tx_create(os); >>>> dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, BUF_SIZE); >>>> // dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT); >>>> >>>> error = dmu_tx_assign(tx, TXG_WAIT); >>>> if (error) { >>>> dmu_tx_abort(tx); >>>> return (error); >>>> } >>>> >>>> object = dmu_object_alloc(os, DMU_OT_UINT64_OTHER, 0, >>>> DMU_OT_NONE, 0, tx); >>>> >>>> printf("Object %lld\n", object); >>>> >>>> dmu_write(os, object, 0, BUF_SIZE, buf, tx); >>>> dmu_tx_commit(tx); >>>> >>>> txg_wait_synced(dmu_objset_pool(os), 0); >>>> </code snippet> >>>> >>>> It is interesting that the checksum that is reported is the pattern >>>> that I try to write. >>>> >>>> This is the panic stack: >>>> d11d8e65 _lwp_kill (98, 6) + 15 >>>> d1192102 raise (6) + 22 >>>> d1170dad abort (81f5d00, d1354000, ce3fdcc8, ce3fdcbc, d13c0568, >>>> ce3fdcbc) + cd >>>> d131ed79 vpanic (d1341dbc, ce3fdcc8) + 51 >>>> d131ed9f panic (d1341dbc, d135a384, d135a724, d133a630, 0, 0) + 1f >>>> d131921d zio_done (81f5d00) + 455 >>>> d131c15d zio_next_stage (81f5d00) + 161 >>>> d1318b92 zio_wait_for_children (81f5d00, 11, 81f5ef0) + 6a >>>> d1318c88 zio_wait_children_done (81f5d00) + 18 >>>> d131c15d zio_next_stage (81f5d00) + 161 >>>> d131ba83 zio_vdev_io_assess (81f5d00) + 183 >>>> d131c15d zio_next_stage (81f5d00) + 161 >>>> d1307011 vdev_mirror_io_done (81f5d00) + 421 >>>> d131b8a2 zio_vdev_io_done (81f5d00, 0, d0e0ac00, d1210000, >>>> d11ba2df, 3) + 36 >>>> d131f585 taskq_thread (81809c0) + 89 >>>> d11d7604 _thr_setup (d0e0ac00) + 52 >>>> d11d7860 _lwp_start (d0e0ac00, 0, 0, 0, 0, 0) >>>> >>>> Thanks in advance! >>>> >>>> Regards, >>>> Manoj-------------- next part -------------- A non-text attachment was scrubbed... Name: udmu.c Type: text/x-csrc Size: 6517 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070628/dc13077c/attachment.bin>
Manoj Joseph wrote:> Hi, > > In brief, what I am trying to do is to use libzpool to access a zpool - > like ztest does.[snip]> No, AFAIK, the pool is not damaged. But yes, it looks like the device > can''t be written to by the userland zfs.Well, I might have figured out something. Turssing the process shows this: /1: open64("/dev/rdsk/c2t0d0s0", O_RDWR|O_LARGEFILE) = 3 /108: pwrite64(3, " X0101\0140104\n $\0\r ".., 638, 4198400) Err#22 EINVAL /108: pwrite64(3, "FC BFC BFC BFC BFC BFC B".., 386, 4199038) Err#22 EINVAL [more failures...] The writes are not aligned to a block boundary. And, apparantly, unlike files, this does not work for devices. Question: were ztest and libzpool not meant to be run on real devices? Or could there be an issue in how I setup up things? Regards, Manoj
Manoj Joseph wrote:> Manoj Joseph wrote: >> Hi, >> >> In brief, what I am trying to do is to use libzpool to access a zpool >> - like ztest does. > > [snip] > >> No, AFAIK, the pool is not damaged. But yes, it looks like the device >> can''t be written to by the userland zfs. > > Well, I might have figured out something. > > Turssing the process shows this: > > /1: open64("/dev/rdsk/c2t0d0s0", O_RDWR|O_LARGEFILE) = 3 > /108: pwrite64(3, " X0101\0140104\n $\0\r ".., 638, 4198400) Err#22 > EINVAL > /108: pwrite64(3, "FC BFC BFC BFC BFC BFC B".., 386, 4199038) Err#22 > EINVAL > [more failures...] > > The writes are not aligned to a block boundary. And, apparantly, unlike > files, this does not work for devices. > > Question: were ztest and libzpool not meant to be run on real devices? > Or could there be an issue in how I setup up things?The failing write has this call stack: pwrite64:return libc.so.1`_pwrite64+0x15 libzpool.so.1`vn_rdwr+0x5b libzpool.so.1`vdev_file_io_start+0x17e libzpool.so.1`vdev_io_start+0x18 libzpool.so.1`zio_vdev_io_start+0x33d [snip] usr/src/uts/common/fs/zfs/vdev_file.c has this: /* * From userland we access disks just like files. */ #ifndef _KERNEL vdev_ops_t vdev_disk_ops = { vdev_file_open, vdev_file_close, vdev_default_asize, vdev_file_io_start, vdev_file_io_done, NULL, VDEV_TYPE_DISK, /* name of this vdev type */ B_TRUE /* leaf vdev */ }; Guess vdev_file_io_start() does not work very well for devices. Regards, Manoj
Manoj Joseph wrote:> Manoj Joseph wrote: >> Manoj Joseph wrote: >>> Hi, >>> >>> In brief, what I am trying to do is to use libzpool to access a zpool >>> - like ztest does. >> >> [snip] >> >>> No, AFAIK, the pool is not damaged. But yes, it looks like the device >>> can''t be written to by the userland zfs. >> >> Well, I might have figured out something. >> >> Turssing the process shows this: >> >> /1: open64("/dev/rdsk/c2t0d0s0", O_RDWR|O_LARGEFILE) = 3 >> /108: pwrite64(3, " X0101\0140104\n $\0\r ".., 638, 4198400) Err#22 >> EINVAL >> /108: pwrite64(3, "FC BFC BFC BFC BFC BFC B".., 386, 4199038) Err#22 >> EINVAL >> [more failures...] >> >> The writes are not aligned to a block boundary. And, apparantly, >> unlike files, this does not work for devices. >> >> Question: were ztest and libzpool not meant to be run on real devices? >> Or could there be an issue in how I setup up things? > > The failing write has this call stack: > > pwrite64:return > libc.so.1`_pwrite64+0x15 > libzpool.so.1`vn_rdwr+0x5b > libzpool.so.1`vdev_file_io_start+0x17e > libzpool.so.1`vdev_io_start+0x18 > libzpool.so.1`zio_vdev_io_start+0x33d > [snip] > > usr/src/uts/common/fs/zfs/vdev_file.c has this: > > /* > * From userland we access disks just like files. > */ > #ifndef _KERNEL > > vdev_ops_t vdev_disk_ops = { > vdev_file_open, > vdev_file_close, > vdev_default_asize, > vdev_file_io_start, > vdev_file_io_done, > NULL, > VDEV_TYPE_DISK, /* name of this vdev type */ > B_TRUE /* leaf vdev */ > }; > > Guess vdev_file_io_start() does not work very well for devices.Unlike what I had assumed earlier, zio_t that is passed to vdev_file_io_start() has aligned offset and size. The libzpool library, when writing data to the devices below a zpool, splits the write into two. This is done for the sake of testing. The comment in the routine, vn_rdwr() says this: /* * To simulate partial disk writes, we split writes into two * system calls so that the process can be killed in between. */ This has the effect of creating misaligned writes to raw devices which fail with errno=EINVAL. Patching that solves the problem for me. :) End of this thread! ;) Cheers Manoj
Hi Matt, ZFS-team, Problem ------- libzpool.so, when calling pwrite(2), splits the write into two. This is done to simulate partial disk writes. This has the side effect that the writes are not block aligned. Hence when the underlying device is a raw device, the write fails. Note: ztest always runs on top of files and hence does not see this failure. Solution -------- Introduce a flag split_io, that when set, causes writes to be split (the current behavior). This is not set by default and is turned on by ztest. Patch built on top of build 55 is attached. Could this patch be accepted into opensolaris? Regards, Manoj Matthew Ahrens wrote:> Manoj Joseph wrote: >> Unlike what I had assumed earlier, zio_t that is passed to >> vdev_file_io_start() has aligned offset and size. >> >> The libzpool library, when writing data to the devices below a zpool, >> splits the write into two. This is done for the sake of testing. The >> comment in the routine, vn_rdwr() says this: >> /* >> * To simulate partial disk writes, we split writes into two >> * system calls so that the process can be killed in between. >> */ >> >> This has the effect of creating misaligned writes to raw devices which >> fail with errno=EINVAL. > > Cool, glad you were able to figure it out! > > --matt-------------- next part -------------- A non-text attachment was scrubbed... Name: 55.diff Type: text/x-patch Size: 2244 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070820/02a65471/attachment.bin>