Hello ZFS, System was rebooted and after reboot server again System is snv_39, SPARC, T2000 bash-3.00# ptree 7 /lib/svc/bin/svc.startd -s 163 /sbin/sh /lib/svc/method/fs-local 254 /usr/sbin/zfs mount -a [...] bash-3.00# zfs list|wc -l 46 Using df I can see most file systems are already mounted.> ::ps!grep zfsR 254 163 7 7 0 0x4a004000 00000600219e1800 zfs> 00000600219e1800::walk thread|::findstack -vstack pointer for thread 300013026a0: 2a10069ebb1 [ 000002a10069ebb1 cv_wait+0x40() ] 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, 600226a1870, 700d13d8, 7ba2d000, 60006821200) 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0) 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, 60021a8bae0, 2f, 0) 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100) 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0) 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0, 0)># iostat -xnz 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 c4t600C0FF00000000009258F3E4C4C5601d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 c4t600C0FF00000000009258F3E4C4C5601d0 ^C bash-3.00# However for many seconds no IOs at all are issued. # mpstat 1 [...] CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 99 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 3 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 100 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 98 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 99 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 100 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 100 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 100 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 100 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 99 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 100 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 100 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 100 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 100 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 100 ^C bash-3.00# # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}'' -n tick-5s''{printa(@);exit(0);}'' [...] syscall_mstate 12636328 callout_schedule_1 14428108 hwblkclr 16656308 avl_rotation 17196252 page_pptonum 19457456 sfmmu_mlist_enter 20078508 sfmmu_mlist_exit 20804176 page_numtomemseg_nolock 23377972 avl_insert 23727512 avl_insert_here 25281752 sfmmu_mlspl_enter 32277524 kmem_cache_alloc 38497240 xc_serv 94880184 disp_getwork 118366960 disp_anywork 191004096 space_map_seg_compare 573117700 resume 7059674296 # bash-3.00# dtrace -n fbt::space_map_seg_compare:entry''{@[stack()]=count();}'' dtrace: description ''fbt::space_map_seg_compare:entry'' matched 1 probe ^C genunix`avl_find+0x38 zfs`space_map_add+0x12c zfs`space_map_load+0x214 zfs`metaslab_activate+0x3c zfs`metaslab_group_alloc+0x1b0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x50 zfs`zio_write_compress+0x1e4 zfs`arc_write+0xbc zfs`dbuf_sync+0x6b0 zfs`dnode_sync+0x300 zfs`dmu_objset_sync_dnodes+0x68 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x60 zfs`spa_sync+0xe0 zfs`txg_sync_thread+0x130 unix`thread_start+0x4 316769 genunix`avl_find+0x38 zfs`space_map_remove+0x98 zfs`space_map_load+0x214 zfs`metaslab_activate+0x3c zfs`metaslab_group_alloc+0x1b0 zfs`metaslab_alloc_dva+0x10c zfs`metaslab_alloc+0x2c zfs`zio_dva_allocate+0x50 zfs`zio_write_compress+0x1e4 zfs`arc_write+0xbc zfs`dbuf_sync+0x6b0 zfs`dnode_sync+0x300 zfs`dmu_objset_sync_dnodes+0x68 zfs`dmu_objset_sync+0x50 zfs`dsl_dataset_sync+0xc zfs`dsl_pool_sync+0x60 zfs`spa_sync+0xe0 zfs`txg_sync_thread+0x130 unix`thread_start+0x4 404210 bash-3.00# Above was running for 3-4seconds I can see some progress - I mean it mounts another filesystem in every 3-10 minutes. However overall it could take even over an hour to boot server before zfs mount -a mounts actually all file systems. Also with default Solaris configs during this stage you can''t login either from console or ssh and no messages are issued. That is bad also as you don''t know what is going on. We had to reboot again, boot to single user, run sshd, login via ssh and let boot process to proceed in order to see why server won''t boot. Using zpool status all pools are ok and no errors at all are reported. Summary: 1. zfs mount -a shouldn''t "sleep" 1h! 2. one virtual CPU core of T2000 1GH is consumed for most of the time during zfs mount -a - why is it so CPU intensive? On that server CPU became limiting in that way that usually no IOs were issued to disks during mount. When it happened on faster CPU disks were doing 100MB/s, but still it took 10-20 minutes to zfs mount disks. 3. I thought zfs mount problem so it exits earlier was already addressed. So this is completely new or still something wrong is going on. I''m thinking about http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 4. Would be better if zfs mount -a would actually do it in parallel as it looks like it mounts in serial fashion right now Above behavior is really BAD and it makes ZFS looks like fsck is actually build-in - I know it isn''t. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Robert, Monday, July 31, 2006, 12:48:30 AM, you wrote: RM> Hello ZFS, RM> System was rebooted and after reboot server again RM> System is snv_39, SPARC, T2000 RM> bash-3.00# ptree RM> 7 /lib/svc/bin/svc.startd -s RM> 163 /sbin/sh /lib/svc/method/fs-local RM> 254 /usr/sbin/zfs mount -a RM> [...] RM> bash-3.00# zfs list|wc -l RM> 46 RM> Using df I can see most file systems are already mounted. RM>>> ::ps!grep zfsRM> R 254 163 7 7 0 0x4a004000 00000600219e1800 zfs>> 00000600219e1800::walk thread|::findstack -vRM> stack pointer for thread 300013026a0: 2a10069ebb1 RM> [ 000002a10069ebb1 cv_wait+0x40() ] RM> 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) RM> 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200) RM> 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0) RM> 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, 60021a8bae0, 2f, 0) RM> 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) RM> 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100) RM> 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0) RM> 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0, 0)>>RM> # iostat -xnz 1 RM> extended device statistics RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device RM> 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 RM> c4t600C0FF00000000009258F3E4C4C5601d0 RM> extended device statistics RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device RM> 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 RM> c4t600C0FF00000000009258F3E4C4C5601d0 RM> ^C RM> bash-3.00# RM> However for many seconds no IOs at all are issued. RM> # mpstat 1 RM> [...] RM> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl RM> 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 99 RM> 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 3 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 RM> 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 100 RM> 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 98 RM> 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 RM> 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 99 RM> 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 100 RM> 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 RM> 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 100 RM> 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 100 RM> 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 100 RM> 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 99 RM> 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 100 RM> 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 RM> 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 100 RM> 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 100 RM> 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 RM> 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 RM> 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 100 RM> 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 RM> 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 100 RM> ^C RM> bash-3.00# RM> # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n RM> fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}'' -n tick-5s''{printa(@);exit(0);}'' RM> [...] RM> syscall_mstate 12636328 RM> callout_schedule_1 14428108 RM> hwblkclr 16656308 RM> avl_rotation 17196252 RM> page_pptonum 19457456 RM> sfmmu_mlist_enter 20078508 RM> sfmmu_mlist_exit 20804176 RM> page_numtomemseg_nolock 23377972 RM> avl_insert 23727512 RM> avl_insert_here 25281752 RM> sfmmu_mlspl_enter 32277524 RM> kmem_cache_alloc 38497240 RM> xc_serv 94880184 RM> disp_getwork 118366960 RM> disp_anywork 191004096 RM> space_map_seg_compare 573117700 RM> resume 7059674296 RM> # RM> bash-3.00# dtrace -n RM> fbt::space_map_seg_compare:entry''{@[stack()]=count();}'' RM> dtrace: description ''fbt::space_map_seg_compare:entry'' matched 1 probe RM> ^C RM> genunix`avl_find+0x38 RM> zfs`space_map_add+0x12c RM> zfs`space_map_load+0x214 RM> zfs`metaslab_activate+0x3c RM> zfs`metaslab_group_alloc+0x1b0 RM> zfs`metaslab_alloc_dva+0x10c RM> zfs`metaslab_alloc+0x2c RM> zfs`zio_dva_allocate+0x50 RM> zfs`zio_write_compress+0x1e4 RM> zfs`arc_write+0xbc RM> zfs`dbuf_sync+0x6b0 RM> zfs`dnode_sync+0x300 RM> zfs`dmu_objset_sync_dnodes+0x68 RM> zfs`dmu_objset_sync+0x50 RM> zfs`dsl_dataset_sync+0xc RM> zfs`dsl_pool_sync+0x60 RM> zfs`spa_sync+0xe0 RM> zfs`txg_sync_thread+0x130 RM> unix`thread_start+0x4 RM> 316769 RM> genunix`avl_find+0x38 RM> zfs`space_map_remove+0x98 RM> zfs`space_map_load+0x214 RM> zfs`metaslab_activate+0x3c RM> zfs`metaslab_group_alloc+0x1b0 RM> zfs`metaslab_alloc_dva+0x10c RM> zfs`metaslab_alloc+0x2c RM> zfs`zio_dva_allocate+0x50 RM> zfs`zio_write_compress+0x1e4 RM> zfs`arc_write+0xbc RM> zfs`dbuf_sync+0x6b0 RM> zfs`dnode_sync+0x300 RM> zfs`dmu_objset_sync_dnodes+0x68 RM> zfs`dmu_objset_sync+0x50 RM> zfs`dsl_dataset_sync+0xc RM> zfs`dsl_pool_sync+0x60 RM> zfs`spa_sync+0xe0 RM> zfs`txg_sync_thread+0x130 RM> unix`thread_start+0x4 RM> 404210 RM> bash-3.00# RM> Above was running for 3-4seconds RM> I can see some progress - I mean it mounts another filesystem in every RM> 3-10 minutes. RM> However overall it could take even over an hour to boot server before RM> zfs mount -a mounts actually all file systems. Also with default RM> Solaris configs during this stage you can''t login either from console RM> or ssh and no messages are issued. That is bad also as you don''t know RM> what is going on. We had to reboot again, boot to single user, run RM> sshd, login via ssh and let boot process to proceed in order to see RM> why server won''t boot. RM> Using zpool status all pools are ok and no errors at all are reported. RM> Summary: RM> 1. zfs mount -a shouldn''t "sleep" 1h! RM> 2. one virtual CPU core of T2000 1GH is consumed for RM> most of the time during zfs mount -a - why is it so CPU RM> intensive? On that server CPU became limiting in that way RM> that usually no IOs were issued to disks during mount. RM> When it happened on faster CPU disks were doing 100MB/s, RM> but still it took 10-20 minutes to zfs mount disks. RM> 3. I thought zfs mount problem so it exits earlier was already RM> addressed. So this is completely new or still something wrong is RM> going on. I''m thinking about RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 RM> 4. Would be better if zfs mount -a would actually do it in parallel RM> as it looks like it mounts in serial fashion right now RM> Above behavior is really BAD and it makes ZFS looks like fsck is RM> actually build-in - I know it isn''t. RM> Any ideas? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote On 08/01/06 11:41,:> Hello Robert, > > Monday, July 31, 2006, 12:48:30 AM, you wrote: > > RM> Hello ZFS, > > RM> System was rebooted and after reboot server again > > RM> System is snv_39, SPARC, T2000 > > RM> bash-3.00# ptree > RM> 7 /lib/svc/bin/svc.startd -s > RM> 163 /sbin/sh /lib/svc/method/fs-local > RM> 254 /usr/sbin/zfs mount -a > RM> [...] > > RM> bash-3.00# zfs list|wc -l > RM> 46 > > RM> Using df I can see most file systems are already mounted. > > RM> > >>>::ps!grep zfs > > RM> R 254 163 7 7 0 0x4a004000 00000600219e1800 zfs > >>>00000600219e1800::walk thread|::findstack -v > > RM> stack pointer for thread 300013026a0: 2a10069ebb1 > RM> [ 000002a10069ebb1 cv_wait+0x40() ] > RM> 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, > RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) > RM> 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, > RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200) > RM> 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0) > RM> 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, 60021a8bae0, 2f, 0) > RM> 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) > RM> 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100) > RM> 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0) > RM> 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0, 0) > > > > > RM> # iostat -xnz 1 > RM> extended device statistics > RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > RM> 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 > RM> c4t600C0FF00000000009258F3E4C4C5601d0 > RM> extended device statistics > RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > RM> 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 > RM> c4t600C0FF00000000009258F3E4C4C5601d0 > RM> ^C > RM> bash-3.00# > > RM> However for many seconds no IOs at all are issued. > > RM> # mpstat 1 > RM> [...] > > RM> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > RM> 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 99 > RM> 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 3 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 > RM> 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 100 > RM> 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 98 > RM> 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 > RM> 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 99 > RM> 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 100 > RM> 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 > RM> 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 100 > RM> 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 100 > RM> 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 100 > RM> 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 99 > RM> 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 100 > RM> 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 > RM> 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 100 > RM> 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 100 > RM> 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 > RM> 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 > RM> 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 100 > RM> 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 > RM> 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 100 > RM> ^C > RM> bash-3.00# > > RM> # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n > RM> fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}'' -n tick-5s''{printa(@);exit(0);}'' > RM> [...] > RM> syscall_mstate 12636328 > RM> callout_schedule_1 14428108 > RM> hwblkclr 16656308 > RM> avl_rotation 17196252 > RM> page_pptonum 19457456 > RM> sfmmu_mlist_enter 20078508 > RM> sfmmu_mlist_exit 20804176 > RM> page_numtomemseg_nolock 23377972 > RM> avl_insert 23727512 > RM> avl_insert_here 25281752 > RM> sfmmu_mlspl_enter 32277524 > RM> kmem_cache_alloc 38497240 > RM> xc_serv 94880184 > RM> disp_getwork 118366960 > RM> disp_anywork 191004096 > RM> space_map_seg_compare 573117700 > RM> resume 7059674296 > > RM> # > > RM> bash-3.00# dtrace -n > RM> fbt::space_map_seg_compare:entry''{@[stack()]=count();}'' > RM> dtrace: description ''fbt::space_map_seg_compare:entry'' matched 1 probe > RM> ^C > > > RM> genunix`avl_find+0x38 > RM> zfs`space_map_add+0x12c > RM> zfs`space_map_load+0x214 > RM> zfs`metaslab_activate+0x3c > RM> zfs`metaslab_group_alloc+0x1b0 > RM> zfs`metaslab_alloc_dva+0x10c > RM> zfs`metaslab_alloc+0x2c > RM> zfs`zio_dva_allocate+0x50 > RM> zfs`zio_write_compress+0x1e4 > RM> zfs`arc_write+0xbc > RM> zfs`dbuf_sync+0x6b0 > RM> zfs`dnode_sync+0x300 > RM> zfs`dmu_objset_sync_dnodes+0x68 > RM> zfs`dmu_objset_sync+0x50 > RM> zfs`dsl_dataset_sync+0xc > RM> zfs`dsl_pool_sync+0x60 > RM> zfs`spa_sync+0xe0 > RM> zfs`txg_sync_thread+0x130 > RM> unix`thread_start+0x4 > RM> 316769 > > RM> genunix`avl_find+0x38 > RM> zfs`space_map_remove+0x98 > RM> zfs`space_map_load+0x214 > RM> zfs`metaslab_activate+0x3c > RM> zfs`metaslab_group_alloc+0x1b0 > RM> zfs`metaslab_alloc_dva+0x10c > RM> zfs`metaslab_alloc+0x2c > RM> zfs`zio_dva_allocate+0x50 > RM> zfs`zio_write_compress+0x1e4 > RM> zfs`arc_write+0xbc > RM> zfs`dbuf_sync+0x6b0 > RM> zfs`dnode_sync+0x300 > RM> zfs`dmu_objset_sync_dnodes+0x68 > RM> zfs`dmu_objset_sync+0x50 > RM> zfs`dsl_dataset_sync+0xc > RM> zfs`dsl_pool_sync+0x60 > RM> zfs`spa_sync+0xe0 > RM> zfs`txg_sync_thread+0x130 > RM> unix`thread_start+0x4 > RM> 404210 > RM> bash-3.00# > > RM> Above was running for 3-4seconds > > > > RM> I can see some progress - I mean it mounts another filesystem in every > RM> 3-10 minutes. > > RM> However overall it could take even over an hour to boot server before > RM> zfs mount -a mounts actually all file systems. Also with default > RM> Solaris configs during this stage you can''t login either from console > RM> or ssh and no messages are issued. That is bad also as you don''t know > RM> what is going on. We had to reboot again, boot to single user, run > RM> sshd, login via ssh and let boot process to proceed in order to see > RM> why server won''t boot. > > RM> Using zpool status all pools are ok and no errors at all are reported. > > > RM> Summary: > > RM> 1. zfs mount -a shouldn''t "sleep" 1h! > RM> 2. one virtual CPU core of T2000 1GH is consumed for > RM> most of the time during zfs mount -a - why is it so CPU > RM> intensive? On that server CPU became limiting in that way > RM> that usually no IOs were issued to disks during mount. > RM> When it happened on faster CPU disks were doing 100MB/s, > RM> but still it took 10-20 minutes to zfs mount disks. > RM> 3. I thought zfs mount problem so it exits earlier was already > RM> addressed. So this is completely new or still something wrong is > RM> going on. I''m thinking about > RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 > RM> 4. Would be better if zfs mount -a would actually do it in parallel > RM> as it looks like it mounts in serial fashion right now > > RM> Above behavior is really BAD and it makes ZFS looks like fsck is > RM> actually build-in - I know it isn''t. > RM> > > > > Any ideas?It looks to me like the mount code before replaying the log flushes all transactions in the group (txg_wait_synced()) to get some consistency. This then takes a very long time - searching the space map perhaps because it''s nearly full. Is it? If this is true, I''m not sure what can be done - I''m not a SPA expert. However, a bug should be raised as this isn''t acceptable behaviour. Neil
Hello Neil, Tuesday, August 1, 2006, 8:45:02 PM, you wrote: NP> Robert Milkowski wrote On 08/01/06 11:41,:>> Hello Robert, >> >> Monday, July 31, 2006, 12:48:30 AM, you wrote: >> >> RM> Hello ZFS, >> >> RM> System was rebooted and after reboot server again >> >> RM> System is snv_39, SPARC, T2000 >> >> RM> bash-3.00# ptree >> RM> 7 /lib/svc/bin/svc.startd -s >> RM> 163 /sbin/sh /lib/svc/method/fs-local >> RM> 254 /usr/sbin/zfs mount -a >> RM> [...] >> >> RM> bash-3.00# zfs list|wc -l >> RM> 46 >> >> RM> Using df I can see most file systems are already mounted. >> >> RM> >> >>>>::ps!grep zfs >> >> RM> R 254 163 7 7 0 0x4a004000 00000600219e1800 zfs >> >>>>00000600219e1800::walk thread|::findstack -v >> >> RM> stack pointer for thread 300013026a0: 2a10069ebb1 >> RM> [ 000002a10069ebb1 cv_wait+0x40() ] >> RM> 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, >> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) >> RM> 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, >> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200) >> RM> 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0) >> RM> 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, 60021a8bae0, 2f, 0) >> RM> 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) >> RM> 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100) >> RM> 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0) >> RM> 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0, 0) >> >> >> >> >> RM> # iostat -xnz 1 >> RM> extended device statistics >> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> RM> 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 >> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >> RM> extended device statistics >> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> RM> 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 >> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >> RM> ^C >> RM> bash-3.00# >> >> RM> However for many seconds no IOs at all are issued. >> >> RM> # mpstat 1 >> RM> [...] >> >> RM> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl >> RM> 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 99 >> RM> 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 3 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 >> RM> 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 100 >> RM> 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 98 >> RM> 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 >> RM> 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 99 >> RM> 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 100 >> RM> 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 >> RM> 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 100 >> RM> 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 100 >> RM> 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 100 >> RM> 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 99 >> RM> 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 100 >> RM> 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 >> RM> 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 100 >> RM> 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 100 >> RM> 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 >> RM> 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 >> RM> 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 100 >> RM> 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >> RM> 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 100 >> RM> ^C >> RM> bash-3.00# >> >> RM> # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n >> RM> fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}'' -n tick-5s''{printa(@);exit(0);}'' >> RM> [...] >> RM> syscall_mstate 12636328 >> RM> callout_schedule_1 14428108 >> RM> hwblkclr 16656308 >> RM> avl_rotation 17196252 >> RM> page_pptonum 19457456 >> RM> sfmmu_mlist_enter 20078508 >> RM> sfmmu_mlist_exit 20804176 >> RM> page_numtomemseg_nolock 23377972 >> RM> avl_insert 23727512 >> RM> avl_insert_here 25281752 >> RM> sfmmu_mlspl_enter 32277524 >> RM> kmem_cache_alloc 38497240 >> RM> xc_serv 94880184 >> RM> disp_getwork 118366960 >> RM> disp_anywork 191004096 >> RM> space_map_seg_compare 573117700 >> RM> resume 7059674296 >> >> RM> # >> >> RM> bash-3.00# dtrace -n >> RM> fbt::space_map_seg_compare:entry''{@[stack()]=count();}'' >> RM> dtrace: description ''fbt::space_map_seg_compare:entry'' matched 1 probe >> RM> ^C >> >> >> RM> genunix`avl_find+0x38 >> RM> zfs`space_map_add+0x12c >> RM> zfs`space_map_load+0x214 >> RM> zfs`metaslab_activate+0x3c >> RM> zfs`metaslab_group_alloc+0x1b0 >> RM> zfs`metaslab_alloc_dva+0x10c >> RM> zfs`metaslab_alloc+0x2c >> RM> zfs`zio_dva_allocate+0x50 >> RM> zfs`zio_write_compress+0x1e4 >> RM> zfs`arc_write+0xbc >> RM> zfs`dbuf_sync+0x6b0 >> RM> zfs`dnode_sync+0x300 >> RM> zfs`dmu_objset_sync_dnodes+0x68 >> RM> zfs`dmu_objset_sync+0x50 >> RM> zfs`dsl_dataset_sync+0xc >> RM> zfs`dsl_pool_sync+0x60 >> RM> zfs`spa_sync+0xe0 >> RM> zfs`txg_sync_thread+0x130 >> RM> unix`thread_start+0x4 >> RM> 316769 >> >> RM> genunix`avl_find+0x38 >> RM> zfs`space_map_remove+0x98 >> RM> zfs`space_map_load+0x214 >> RM> zfs`metaslab_activate+0x3c >> RM> zfs`metaslab_group_alloc+0x1b0 >> RM> zfs`metaslab_alloc_dva+0x10c >> RM> zfs`metaslab_alloc+0x2c >> RM> zfs`zio_dva_allocate+0x50 >> RM> zfs`zio_write_compress+0x1e4 >> RM> zfs`arc_write+0xbc >> RM> zfs`dbuf_sync+0x6b0 >> RM> zfs`dnode_sync+0x300 >> RM> zfs`dmu_objset_sync_dnodes+0x68 >> RM> zfs`dmu_objset_sync+0x50 >> RM> zfs`dsl_dataset_sync+0xc >> RM> zfs`dsl_pool_sync+0x60 >> RM> zfs`spa_sync+0xe0 >> RM> zfs`txg_sync_thread+0x130 >> RM> unix`thread_start+0x4 >> RM> 404210 >> RM> bash-3.00# >> >> RM> Above was running for 3-4seconds >> >> >> >> RM> I can see some progress - I mean it mounts another filesystem in every >> RM> 3-10 minutes. >> >> RM> However overall it could take even over an hour to boot server before >> RM> zfs mount -a mounts actually all file systems. Also with default >> RM> Solaris configs during this stage you can''t login either from console >> RM> or ssh and no messages are issued. That is bad also as you don''t know >> RM> what is going on. We had to reboot again, boot to single user, run >> RM> sshd, login via ssh and let boot process to proceed in order to see >> RM> why server won''t boot. >> >> RM> Using zpool status all pools are ok and no errors at all are reported. >> >> >> RM> Summary: >> >> RM> 1. zfs mount -a shouldn''t "sleep" 1h! >> RM> 2. one virtual CPU core of T2000 1GH is consumed for >> RM> most of the time during zfs mount -a - why is it so CPU >> RM> intensive? On that server CPU became limiting in that way >> RM> that usually no IOs were issued to disks during mount. >> RM> When it happened on faster CPU disks were doing 100MB/s, >> RM> but still it took 10-20 minutes to zfs mount disks. >> RM> 3. I thought zfs mount problem so it exits earlier was already >> RM> addressed. So this is completely new or still something wrong is >> RM> going on. I''m thinking about >> RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 >> RM> 4. Would be better if zfs mount -a would actually do it in parallel >> RM> as it looks like it mounts in serial fashion right now >> >> RM> Above behavior is really BAD and it makes ZFS looks like fsck is >> RM> actually build-in - I know it isn''t. >> RM> >> >> >> >> Any ideas?NP> It looks to me like the mount code before replaying the log flushes all NP> transactions in the group (txg_wait_synced()) to get some consistency. NP> This then takes a very long time - searching the space map perhaps because NP> it''s nearly full. Is it? More than 200GB free in a pool of about 4.5TB. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Robert, Wednesday, August 2, 2006, 12:22:11 AM, you wrote: RM> Hello Neil, RM> Tuesday, August 1, 2006, 8:45:02 PM, you wrote: NP>> Robert Milkowski wrote On 08/01/06 11:41,:>>> Hello Robert, >>> >>> Monday, July 31, 2006, 12:48:30 AM, you wrote: >>> >>> RM> Hello ZFS, >>> >>> RM> System was rebooted and after reboot server again >>> >>> RM> System is snv_39, SPARC, T2000 >>> >>> RM> bash-3.00# ptree >>> RM> 7 /lib/svc/bin/svc.startd -s >>> RM> 163 /sbin/sh /lib/svc/method/fs-local >>> RM> 254 /usr/sbin/zfs mount -a >>> RM> [...] >>> >>> RM> bash-3.00# zfs list|wc -l >>> RM> 46 >>> >>> RM> Using df I can see most file systems are already mounted. >>> >>> RM> >>> >>>>>::ps!grep zfs >>> >>> RM> R 254 163 7 7 0 0x4a004000 00000600219e1800 zfs >>> >>>>>00000600219e1800::walk thread|::findstack -v >>> >>> RM> stack pointer for thread 300013026a0: 2a10069ebb1 >>> RM> [ 000002a10069ebb1 cv_wait+0x40() ] >>> RM> 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, >>> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) >>> RM> 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, >>> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200) >>> RM> 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0) >>> RM> 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, 60021a8bae0, 2f, 0) >>> RM> 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) >>> RM> 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100) >>> RM> 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0) >>> RM> 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0, 0) >>> >>> >>> >>> >>> RM> # iostat -xnz 1 >>> RM> extended device statistics >>> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >>> RM> 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 >>> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >>> RM> extended device statistics >>> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >>> RM> 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 >>> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >>> RM> ^C >>> RM> bash-3.00# >>> >>> RM> However for many seconds no IOs at all are issued. >>> >>> RM> # mpstat 1 >>> RM> [...] >>> >>> RM> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl >>> RM> 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 99 >>> RM> 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 3 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 >>> RM> 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 100 >>> RM> 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 98 >>> RM> 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 >>> RM> 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 99 >>> RM> 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 100 >>> RM> 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 >>> RM> 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 100 >>> RM> 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 100 >>> RM> 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 100 >>> RM> 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 99 >>> RM> 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 100 >>> RM> 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 100 >>> RM> 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 100 >>> RM> 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 100 >>> RM> 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 >>> RM> 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100 >>> RM> 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 100 >>> RM> 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 >>> RM> 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 100 >>> RM> ^C >>> RM> bash-3.00# >>> >>> RM> # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n >>> RM> fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}'' -n tick-5s''{printa(@);exit(0);}'' >>> RM> [...] >>> RM> syscall_mstate 12636328 >>> RM> callout_schedule_1 14428108 >>> RM> hwblkclr 16656308 >>> RM> avl_rotation 17196252 >>> RM> page_pptonum 19457456 >>> RM> sfmmu_mlist_enter 20078508 >>> RM> sfmmu_mlist_exit 20804176 >>> RM> page_numtomemseg_nolock 23377972 >>> RM> avl_insert 23727512 >>> RM> avl_insert_here 25281752 >>> RM> sfmmu_mlspl_enter 32277524 >>> RM> kmem_cache_alloc 38497240 >>> RM> xc_serv 94880184 >>> RM> disp_getwork 118366960 >>> RM> disp_anywork 191004096 >>> RM> space_map_seg_compare 573117700 >>> RM> resume 7059674296 >>> >>> RM> # >>> >>> RM> bash-3.00# dtrace -n >>> RM> fbt::space_map_seg_compare:entry''{@[stack()]=count();}'' >>> RM> dtrace: description ''fbt::space_map_seg_compare:entry'' matched 1 probe >>> RM> ^C >>> >>> >>> RM> genunix`avl_find+0x38 >>> RM> zfs`space_map_add+0x12c >>> RM> zfs`space_map_load+0x214 >>> RM> zfs`metaslab_activate+0x3c >>> RM> zfs`metaslab_group_alloc+0x1b0 >>> RM> zfs`metaslab_alloc_dva+0x10c >>> RM> zfs`metaslab_alloc+0x2c >>> RM> zfs`zio_dva_allocate+0x50 >>> RM> zfs`zio_write_compress+0x1e4 >>> RM> zfs`arc_write+0xbc >>> RM> zfs`dbuf_sync+0x6b0 >>> RM> zfs`dnode_sync+0x300 >>> RM> zfs`dmu_objset_sync_dnodes+0x68 >>> RM> zfs`dmu_objset_sync+0x50 >>> RM> zfs`dsl_dataset_sync+0xc >>> RM> zfs`dsl_pool_sync+0x60 >>> RM> zfs`spa_sync+0xe0 >>> RM> zfs`txg_sync_thread+0x130 >>> RM> unix`thread_start+0x4 >>> RM> 316769 >>> >>> RM> genunix`avl_find+0x38 >>> RM> zfs`space_map_remove+0x98 >>> RM> zfs`space_map_load+0x214 >>> RM> zfs`metaslab_activate+0x3c >>> RM> zfs`metaslab_group_alloc+0x1b0 >>> RM> zfs`metaslab_alloc_dva+0x10c >>> RM> zfs`metaslab_alloc+0x2c >>> RM> zfs`zio_dva_allocate+0x50 >>> RM> zfs`zio_write_compress+0x1e4 >>> RM> zfs`arc_write+0xbc >>> RM> zfs`dbuf_sync+0x6b0 >>> RM> zfs`dnode_sync+0x300 >>> RM> zfs`dmu_objset_sync_dnodes+0x68 >>> RM> zfs`dmu_objset_sync+0x50 >>> RM> zfs`dsl_dataset_sync+0xc >>> RM> zfs`dsl_pool_sync+0x60 >>> RM> zfs`spa_sync+0xe0 >>> RM> zfs`txg_sync_thread+0x130 >>> RM> unix`thread_start+0x4 >>> RM> 404210 >>> RM> bash-3.00# >>> >>> RM> Above was running for 3-4seconds >>> >>> >>> >>> RM> I can see some progress - I mean it mounts another filesystem in every >>> RM> 3-10 minutes. >>> >>> RM> However overall it could take even over an hour to boot server before >>> RM> zfs mount -a mounts actually all file systems. Also with default >>> RM> Solaris configs during this stage you can''t login either from console >>> RM> or ssh and no messages are issued. That is bad also as you don''t know >>> RM> what is going on. We had to reboot again, boot to single user, run >>> RM> sshd, login via ssh and let boot process to proceed in order to see >>> RM> why server won''t boot. >>> >>> RM> Using zpool status all pools are ok and no errors at all are reported. >>> >>> >>> RM> Summary: >>> >>> RM> 1. zfs mount -a shouldn''t "sleep" 1h! >>> RM> 2. one virtual CPU core of T2000 1GH is consumed for >>> RM> most of the time during zfs mount -a - why is it so CPU >>> RM> intensive? On that server CPU became limiting in that way >>> RM> that usually no IOs were issued to disks during mount. >>> RM> When it happened on faster CPU disks were doing 100MB/s, >>> RM> but still it took 10-20 minutes to zfs mount disks. >>> RM> 3. I thought zfs mount problem so it exits earlier was already >>> RM> addressed. So this is completely new or still something wrong is >>> RM> going on. I''m thinking about >>> RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 >>> RM> 4. Would be better if zfs mount -a would actually do it in parallel >>> RM> as it looks like it mounts in serial fashion right now >>> >>> RM> Above behavior is really BAD and it makes ZFS looks like fsck is >>> RM> actually build-in - I know it isn''t. >>> RM> >>> >>> >>> >>> Any ideas?NP>> It looks to me like the mount code before replaying the log flushes all NP>> transactions in the group (txg_wait_synced()) to get some consistency. NP>> This then takes a very long time - searching the space map perhaps because NP>> it''s nearly full. Is it? RM> More than 200GB free in a pool of about 4.5TB. I run ''zfs set quota=400g nfs-s5-s8/d5201 nfs-s5-s8/d5202 nfs-s5-s8/d5203 nfs-s5-s8/d5'' and it''s still running after 30 minutes. # zpool iostat nfs-s5-s8 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- nfs-s5-s8 4.24T 302G 243 25 13.1M 223K nfs-s5-s8 4.24T 302G 409 0 15.0M 0 nfs-s5-s8 4.24T 302G 162 0 8.95M 0 nfs-s5-s8 4.24T 302G 317 0 18.9M 0 nfs-s5-s8 4.24T 302G 298 30 24.0M 90.0K nfs-s5-s8 4.24T 302G 374 0 26.9M 0 # mdb -kw Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch md ip sctp usba lofs logindmux ptm qlc fctl cpc fcp fcip random crypto zfs ssd nfs ]> ::ps!grep zfsR 23426 2460 23426 2454 0 0x4a004000 0000060002a66790 zfs> 0000060002a66790::walk thread|::findstack -vstack pointer for thread 3006703b320: 2a10af5ccb1 [ 000002a10af5ccb1 cv_wait+0x40() ] 000002a10af5cd61 txg_wait_synced+0x54(30055528990, 2f23aa5, 2f23aa3, 300555289d0, 300555289d2, 30055528988) 000002a10af5ce11 dsl_sync_task_group_wait+0x120(300c0860360, 0, 2f23aa5, 71, 7b64ecf8, 300c0860388) 000002a10af5cec1 dsl_sync_task_do+0x28(300c0860360, 7b611dcc, 7b611e78, 60009e5e380, 2a10af5d828, 0) 000002a10af5cf71 dsl_dir_set_quota+0x64(7b648400, 6400000000, 7b611c00, 7b611c00, 60009e5e380, 0) 000002a10af5d031 zfsdev_ioctl+0x158(701d8c00, 3f, ffbfec18, 15, 54, 600027aa000) 000002a10af5d0e1 fop_ioctl+0x20(600026dd280, 5a15, ffbfec18, 100003, 6001cacde58, 120e894) 000002a10af5d191 ioctl+0x184(3, 60011c64ab8, ffbfec18, b614c, 1010101, 5a15) 000002a10af5d2e1 syscall_trap32+0xcc(3, 5a15, ffbfec18, b614c, 1010101, fffffd74) Setting quota on file systems in other pools takes something like 10s which still is quite long but not 30 minutes (or more)! The only difference I see is that nfs-s5-s8 has less free space than other pools. This is snv_43 as I reimported pool on another server. Something is terribly wrong here. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Neil, I can see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6478388 integrated. I guess it could be related to problem I described here, right? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hi Robert, Yes, it could be related, or even the bug. Certainly the replay was (prior to this bug fix) extremely slow. I don''t really have enough information to determine if it''s the exact problem, though after re-reading your original post I strongly suspect it is. I also putback a companion fix which should be helpful in determining if zil_replay is making progress, or hung: 6486496 zil_replay() useful debug Neil. Robert Milkowski wrote On 11/09/06 18:10,:>Hello Neil, > >I can see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6478388 >integrated. I guess it could be related to problem I described here, >right? > > >