thr3ads.net - zfs discuss - [zfs-discuss] zfs mount stuck in zil

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2006-Jul-30 22:48 UTC

[zfs-discuss] zfs mount stuck in zil_replay

Hello ZFS,

   System was rebooted and after reboot server again

System is snv_39, SPARC, T2000

bash-3.00# ptree
7     /lib/svc/bin/svc.startd -s
  163   /sbin/sh /lib/svc/method/fs-local
    254   /usr/sbin/zfs mount -a
[...]

bash-3.00# zfs list|wc -l
      46

Using df I can see most file systems are already mounted.

      > ::ps!grep zfsR    254    163      7      7      0 0x4a004000 00000600219e1800
zfs> 00000600219e1800::walk thread|::findstack -vstack pointer for thread 300013026a0: 2a10069ebb1
[ 000002a10069ebb1 cv_wait+0x40() ]
  000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, 3000107fb90,
3000052f110, 3000052f112, 3000052f0c8)
  000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, 600226a1870,
700d13d8, 7ba2d000, 60006821200)
  000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1, 0, 0)
  000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8,
60021a8bae0, 2f, 0)
  000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0)
  000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c, 100)
  000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, 15, 0)
  000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, 0,
0)>


# iostat -xnz 1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    3.0  513.0  192.0 2822.1  0.0  2.1    0.0    4.0   0  95
c4t600C0FF00000000009258F3E4C4C5601d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    7.0  598.1  388.6 1832.9  0.0  2.0    0.0    3.4   0  93
c4t600C0FF00000000009258F3E4C4C5601d0
^C
bash-3.00#

However for many seconds no IOs at all are issued.

# mpstat 1
[...]

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   74   209  109    0    0    0    0    0     0    0   1   0  99
  1    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  2    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  3    0   0    0     1    0    0    0    0    0    0     0    0 100   0   0
  4    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  5    0   0    0     6    1    8    0    0    0    0     0    0   0   0 100
  6    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  7    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  8    3   0    1     7    0   12    0    0    2    0   245    1   1   0  98
  9    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 10    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 11    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 12    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 13    0   0    0     6    0   10    0    0    0    0     0    0   0   0 100
 14    0   0    0     3    0    2    1    0    0    0   188    0   1   0  99
 15    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 16    0   0    0     8    1   12    0    0    0    0     0    0   0   0 100
 17    0   0    0     7    0   12    0    0    0    0     0    0   0   0 100
 18    0   0    0    23    0   43    0    0    1    0     0    0   0   0 100
 19    0   0    0     9    0   16    0    0    0    0     0    0   0   0 100
 20    0   0    0     8    0   14    0    0    0    0     0    0   0   0 100
 21    0   0    0     7    2    8    0    1    0    0     1    0   1   0  99
 22    0   0   17    38   34    7    0    1    1    0     5    0   0   0 100
 23    0   0    0     7    0   12    0    0    0    0     0    0   0   0 100
 24    0   0    0     4    0    6    0    1    0    0     0    0   0   0 100
 25    0   0    1     5    0    8    0    1    0    0     0    0   0   0 100
 26    0   0    0     2    0    2    0    0    0    0     0    0   0   0 100
 27    0   0    0     2    0    2    0    0    0    0     0    0   0   0 100
 28    0   0    0     2    0    2    0    0    1    0     0    0   0   0 100
 29    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 30    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 31    0   0    0     6    0    8    0    0    0    0     0    0   0   0 100
^C
bash-3.00#

# dtrace -n fbt:::entry''{self->t=timestamp;}'' -n
fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}''
-n tick-5s''{printa(@);exit(0);}''
[...]
  syscall_mstate                                             12636328
  callout_schedule_1                                         14428108
  hwblkclr                                                   16656308
  avl_rotation                                               17196252
  page_pptonum                                               19457456
  sfmmu_mlist_enter                                          20078508
  sfmmu_mlist_exit                                           20804176
  page_numtomemseg_nolock                                    23377972
  avl_insert                                                 23727512
  avl_insert_here                                            25281752
  sfmmu_mlspl_enter                                          32277524
  kmem_cache_alloc                                           38497240
  xc_serv                                                    94880184
  disp_getwork                                              118366960
  disp_anywork                                              191004096
  space_map_seg_compare                                     573117700
  resume                                                   7059674296

#

bash-3.00# dtrace -n
fbt::space_map_seg_compare:entry''{@[stack()]=count();}''
dtrace: description ''fbt::space_map_seg_compare:entry'' matched
1 probe
^C


              genunix`avl_find+0x38
              zfs`space_map_add+0x12c
              zfs`space_map_load+0x214
              zfs`metaslab_activate+0x3c
              zfs`metaslab_group_alloc+0x1b0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x50
              zfs`zio_write_compress+0x1e4
              zfs`arc_write+0xbc
              zfs`dbuf_sync+0x6b0
              zfs`dnode_sync+0x300
              zfs`dmu_objset_sync_dnodes+0x68
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x60
              zfs`spa_sync+0xe0
              zfs`txg_sync_thread+0x130
              unix`thread_start+0x4
           316769

              genunix`avl_find+0x38
              zfs`space_map_remove+0x98
              zfs`space_map_load+0x214
              zfs`metaslab_activate+0x3c
              zfs`metaslab_group_alloc+0x1b0
              zfs`metaslab_alloc_dva+0x10c
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x50
              zfs`zio_write_compress+0x1e4
              zfs`arc_write+0xbc
              zfs`dbuf_sync+0x6b0
              zfs`dnode_sync+0x300
              zfs`dmu_objset_sync_dnodes+0x68
              zfs`dmu_objset_sync+0x50
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x60
              zfs`spa_sync+0xe0
              zfs`txg_sync_thread+0x130
              unix`thread_start+0x4
           404210
bash-3.00#

Above was running for 3-4seconds



I can see some progress - I mean it mounts another filesystem in every
3-10 minutes.

However overall it could take even over an hour to boot server before
zfs mount -a mounts actually all file systems. Also with default
Solaris configs during this stage you can''t login either from console
or ssh and no messages are issued. That is bad also as you don''t know
what is going on. We had to reboot again, boot to single user, run
sshd, login via ssh and let boot process to proceed in order to see
why server won''t boot.

Using zpool status all pools are ok and no errors at all are reported.


Summary:

  1. zfs mount -a shouldn''t "sleep" 1h!
  2. one virtual CPU core of T2000 1GH is consumed for
     most of the time during zfs mount -a - why is it so CPU
     intensive? On that server CPU became limiting in that way
     that usually no IOs were issued to disks during mount.
     When it happened on faster CPU disks were doing 100MB/s,
     but still it took 10-20 minutes to zfs mount disks.
  3. I thought zfs mount problem so it exits earlier was already
     addressed. So this is completely new or still something wrong is
     going on. I''m thinking about
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670
  4. Would be better if zfs mount -a would actually do it in parallel
     as it looks like it mounts in serial fashion right now

Above behavior is really BAD and it makes ZFS looks like fsck is
actually build-in - I know it isn''t.
  






-- 


Best regards,
 Robert                          mailto:rmilkowski at task.gda.pl
                                     http://milek.blogspot.com

Robert Milkowski

2006-Aug-01 17:41 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Hello Robert,

Monday, July 31, 2006, 12:48:30 AM, you wrote:

RM> Hello ZFS,

RM>    System was rebooted and after reboot server again

RM> System is snv_39, SPARC, T2000

RM> bash-3.00# ptree
RM> 7     /lib/svc/bin/svc.startd -s
RM>   163   /sbin/sh /lib/svc/method/fs-local
RM>     254   /usr/sbin/zfs mount -a
RM> [...]

RM> bash-3.00# zfs list|wc -l
RM>       46

RM> Using df I can see most file systems are already mounted.

RM>       >> ::ps!grep zfsRM> R    254    163      7      7      0 0x4a004000 00000600219e1800
zfs>> 00000600219e1800::walk thread|::findstack -vRM> stack pointer for thread 300013026a0: 2a10069ebb1
RM> [ 000002a10069ebb1 cv_wait+0x40() ]
RM>   000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d,
RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8)
RM>   000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840,
RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200)
RM>   000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, 1,
0, 0)
RM>   000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8,
60021a8bae0, 2f, 0)
RM>   000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0)
RM>   000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, ff38e40c,
100)
RM>   000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40,
15, 0)
RM>   000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c,
0, 0)>>


RM> # iostat -xnz 1
RM>                     extended device statistics
RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
RM>     3.0  513.0  192.0 2822.1  0.0  2.1    0.0    4.0   0  95
RM> c4t600C0FF00000000009258F3E4C4C5601d0
RM>                     extended device statistics
RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
RM>     7.0  598.1  388.6 1832.9  0.0  2.0    0.0    3.4   0  93
RM> c4t600C0FF00000000009258F3E4C4C5601d0
RM> ^C
RM> bash-3.00#

RM> However for many seconds no IOs at all are issued.

RM> # mpstat 1
RM> [...]

RM> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
RM>   0    0   0   74   209  109    0    0    0    0    0     0    0  1   0 
99
RM>   1    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>   2    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>   3    0   0    0     1    0    0    0    0    0    0     0    0 100   0 
0
RM>   4    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>   5    0   0    0     6    1    8    0    0    0    0     0    0  0   0
100
RM>   6    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>   7    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>   8    3   0    1     7    0   12    0    0    2    0   245    1  1   0 
98
RM>   9    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  10    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  11    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  12    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  13    0   0    0     6    0   10    0    0    0    0     0    0  0   0
100
RM>  14    0   0    0     3    0    2    1    0    0    0   188    0  1   0 
99
RM>  15    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  16    0   0    0     8    1   12    0    0    0    0     0    0  0   0
100
RM>  17    0   0    0     7    0   12    0    0    0    0     0    0  0   0
100
RM>  18    0   0    0    23    0   43    0    0    1    0     0    0  0   0
100
RM>  19    0   0    0     9    0   16    0    0    0    0     0    0  0   0
100
RM>  20    0   0    0     8    0   14    0    0    0    0     0    0  0   0
100
RM>  21    0   0    0     7    2    8    0    1    0    0     1    0  1   0 
99
RM>  22    0   0   17    38   34    7    0    1    1    0     5    0  0   0
100
RM>  23    0   0    0     7    0   12    0    0    0    0     0    0  0   0
100
RM>  24    0   0    0     4    0    6    0    1    0    0     0    0  0   0
100
RM>  25    0   0    1     5    0    8    0    1    0    0     0    0  0   0
100
RM>  26    0   0    0     2    0    2    0    0    0    0     0    0  0   0
100
RM>  27    0   0    0     2    0    2    0    0    0    0     0    0  0   0
100
RM>  28    0   0    0     2    0    2    0    0    1    0     0    0  0   0
100
RM>  29    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  30    0   0    0     1    0    0    0    0    0    0     0    0  0   0
100
RM>  31    0   0    0     6    0    8    0    0    0    0     0    0  0   0
100
RM> ^C
RM> bash-3.00#

RM> # dtrace -n fbt:::entry''{self->t=timestamp;}'' -n
RM>
fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}''
-n tick-5s''{printa(@);exit(0);}''
RM> [...]
RM>   syscall_mstate                                             12636328
RM>   callout_schedule_1                                         14428108
RM>   hwblkclr                                                   16656308
RM>   avl_rotation                                               17196252
RM>   page_pptonum                                               19457456
RM>   sfmmu_mlist_enter                                          20078508
RM>   sfmmu_mlist_exit                                           20804176
RM>   page_numtomemseg_nolock                                    23377972
RM>   avl_insert                                                 23727512
RM>   avl_insert_here                                            25281752
RM>   sfmmu_mlspl_enter                                          32277524
RM>   kmem_cache_alloc                                           38497240
RM>   xc_serv                                                    94880184
RM>   disp_getwork                                              118366960
RM>   disp_anywork                                              191004096
RM>   space_map_seg_compare                                     573117700
RM>   resume                                                   7059674296

RM> #

RM> bash-3.00# dtrace -n
RM> fbt::space_map_seg_compare:entry''{@[stack()]=count();}''
RM> dtrace: description ''fbt::space_map_seg_compare:entry''
matched 1 probe
RM> ^C


RM>               genunix`avl_find+0x38
RM>               zfs`space_map_add+0x12c
RM>               zfs`space_map_load+0x214
RM>               zfs`metaslab_activate+0x3c
RM>               zfs`metaslab_group_alloc+0x1b0
RM>               zfs`metaslab_alloc_dva+0x10c
RM>               zfs`metaslab_alloc+0x2c
RM>               zfs`zio_dva_allocate+0x50
RM>               zfs`zio_write_compress+0x1e4
RM>               zfs`arc_write+0xbc
RM>               zfs`dbuf_sync+0x6b0
RM>               zfs`dnode_sync+0x300
RM>               zfs`dmu_objset_sync_dnodes+0x68
RM>               zfs`dmu_objset_sync+0x50
RM>               zfs`dsl_dataset_sync+0xc
RM>               zfs`dsl_pool_sync+0x60
RM>               zfs`spa_sync+0xe0
RM>               zfs`txg_sync_thread+0x130
RM>               unix`thread_start+0x4
RM>            316769

RM>               genunix`avl_find+0x38
RM>               zfs`space_map_remove+0x98
RM>               zfs`space_map_load+0x214
RM>               zfs`metaslab_activate+0x3c
RM>               zfs`metaslab_group_alloc+0x1b0
RM>               zfs`metaslab_alloc_dva+0x10c
RM>               zfs`metaslab_alloc+0x2c
RM>               zfs`zio_dva_allocate+0x50
RM>               zfs`zio_write_compress+0x1e4
RM>               zfs`arc_write+0xbc
RM>               zfs`dbuf_sync+0x6b0
RM>               zfs`dnode_sync+0x300
RM>               zfs`dmu_objset_sync_dnodes+0x68
RM>               zfs`dmu_objset_sync+0x50
RM>               zfs`dsl_dataset_sync+0xc
RM>               zfs`dsl_pool_sync+0x60
RM>               zfs`spa_sync+0xe0
RM>               zfs`txg_sync_thread+0x130
RM>               unix`thread_start+0x4
RM>            404210
RM> bash-3.00#

RM> Above was running for 3-4seconds



RM> I can see some progress - I mean it mounts another filesystem in every
RM> 3-10 minutes.

RM> However overall it could take even over an hour to boot server before
RM> zfs mount -a mounts actually all file systems. Also with default
RM> Solaris configs during this stage you can''t login either from
console
RM> or ssh and no messages are issued. That is bad also as you don''t
know
RM> what is going on. We had to reboot again, boot to single user, run
RM> sshd, login via ssh and let boot process to proceed in order to see
RM> why server won''t boot.

RM> Using zpool status all pools are ok and no errors at all are reported.


RM> Summary:

RM>   1. zfs mount -a shouldn''t "sleep" 1h!
RM>   2. one virtual CPU core of T2000 1GH is consumed for
RM>      most of the time during zfs mount -a - why is it so CPU
RM>      intensive? On that server CPU became limiting in that way
RM>      that usually no IOs were issued to disks during mount.
RM>      When it happened on faster CPU disks were doing 100MB/s,
RM>      but still it took 10-20 minutes to zfs mount disks.
RM>   3. I thought zfs mount problem so it exits earlier was already
RM>      addressed. So this is completely new or still something wrong is
RM>      going on. I''m thinking about
RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670
RM>   4. Would be better if zfs mount -a would actually do it in parallel
RM>      as it looks like it mounts in serial fashion right now

RM> Above behavior is really BAD and it makes ZFS looks like fsck is
RM> actually build-in - I know it isn''t.
RM>   



Any ideas?





-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Neil Perrin

2006-Aug-01 18:45 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Robert Milkowski wrote On 08/01/06 11:41,:> Hello Robert,
> 
> Monday, July 31, 2006, 12:48:30 AM, you wrote:
> 
> RM> Hello ZFS,
> 
> RM>    System was rebooted and after reboot server again
> 
> RM> System is snv_39, SPARC, T2000
> 
> RM> bash-3.00# ptree
> RM> 7     /lib/svc/bin/svc.startd -s
> RM>   163   /sbin/sh /lib/svc/method/fs-local
> RM>     254   /usr/sbin/zfs mount -a
> RM> [...]
> 
> RM> bash-3.00# zfs list|wc -l
> RM>       46
> 
> RM> Using df I can see most file systems are already mounted.
> 
> RM>       
> 
>>>::ps!grep zfs
> 
> RM> R    254    163      7      7      0 0x4a004000 00000600219e1800 zfs
> 
>>>00000600219e1800::walk thread|::findstack -v
> 
> RM> stack pointer for thread 300013026a0: 2a10069ebb1
> RM> [ 000002a10069ebb1 cv_wait+0x40() ]
> RM>   000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d,
> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8)
> RM>   000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840,
> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200)
> RM>   000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000,
60021a8bae0, 1, 0, 0)
> RM>   000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800,
2a10069f9d8, 60021a8bae0, 2f, 0)
> RM>   000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0,
0)
> RM>   000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0,
ff38e40c, 100)
> RM>   000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4,
600013fbc40, 15, 0)
> RM>   000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100,
ff38e40c, 0, 0)
> 
> 
> 
> 
> RM> # iostat -xnz 1
> RM>                     extended device statistics
> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> RM>     3.0  513.0  192.0 2822.1  0.0  2.1    0.0    4.0   0  95
> RM> c4t600C0FF00000000009258F3E4C4C5601d0
> RM>                     extended device statistics
> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> RM>     7.0  598.1  388.6 1832.9  0.0  2.0    0.0    3.4   0  93
> RM> c4t600C0FF00000000009258F3E4C4C5601d0
> RM> ^C
> RM> bash-3.00#
> 
> RM> However for many seconds no IOs at all are issued.
> 
> RM> # mpstat 1
> RM> [...]
> 
> RM> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys
wt idl
> RM>   0    0   0   74   209  109    0    0    0    0    0     0    0  1 
0  99
> RM>   1    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>   2    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>   3    0   0    0     1    0    0    0    0    0    0     0    0 100
0   0
> RM>   4    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>   5    0   0    0     6    1    8    0    0    0    0     0    0  0 
0 100
> RM>   6    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>   7    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>   8    3   0    1     7    0   12    0    0    2    0   245    1  1 
0  98
> RM>   9    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  10    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  11    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  12    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  13    0   0    0     6    0   10    0    0    0    0     0    0  0 
0 100
> RM>  14    0   0    0     3    0    2    1    0    0    0   188    0  1 
0  99
> RM>  15    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  16    0   0    0     8    1   12    0    0    0    0     0    0  0 
0 100
> RM>  17    0   0    0     7    0   12    0    0    0    0     0    0  0 
0 100
> RM>  18    0   0    0    23    0   43    0    0    1    0     0    0  0 
0 100
> RM>  19    0   0    0     9    0   16    0    0    0    0     0    0  0 
0 100
> RM>  20    0   0    0     8    0   14    0    0    0    0     0    0  0 
0 100
> RM>  21    0   0    0     7    2    8    0    1    0    0     1    0  1 
0  99
> RM>  22    0   0   17    38   34    7    0    1    1    0     5    0  0 
0 100
> RM>  23    0   0    0     7    0   12    0    0    0    0     0    0  0 
0 100
> RM>  24    0   0    0     4    0    6    0    1    0    0     0    0  0 
0 100
> RM>  25    0   0    1     5    0    8    0    1    0    0     0    0  0 
0 100
> RM>  26    0   0    0     2    0    2    0    0    0    0     0    0  0 
0 100
> RM>  27    0   0    0     2    0    2    0    0    0    0     0    0  0 
0 100
> RM>  28    0   0    0     2    0    2    0    0    1    0     0    0  0 
0 100
> RM>  29    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  30    0   0    0     1    0    0    0    0    0    0     0    0  0 
0 100
> RM>  31    0   0    0     6    0    8    0    0    0    0     0    0  0 
0 100
> RM> ^C
> RM> bash-3.00#
> 
> RM> # dtrace -n fbt:::entry''{self->t=timestamp;}''
-n
> RM>
fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}''
-n tick-5s''{printa(@);exit(0);}''
> RM> [...]
> RM>   syscall_mstate                                            
12636328
> RM>   callout_schedule_1                                        
14428108
> RM>   hwblkclr                                                  
16656308
> RM>   avl_rotation                                              
17196252
> RM>   page_pptonum                                              
19457456
> RM>   sfmmu_mlist_enter                                         
20078508
> RM>   sfmmu_mlist_exit                                          
20804176
> RM>   page_numtomemseg_nolock                                   
23377972
> RM>   avl_insert                                                
23727512
> RM>   avl_insert_here                                           
25281752
> RM>   sfmmu_mlspl_enter                                         
32277524
> RM>   kmem_cache_alloc                                          
38497240
> RM>   xc_serv                                                   
94880184
> RM>   disp_getwork                                             
118366960
> RM>   disp_anywork                                             
191004096
> RM>   space_map_seg_compare                                    
573117700
> RM>   resume                                                  
7059674296
> 
> RM> #
> 
> RM> bash-3.00# dtrace -n
> RM>
fbt::space_map_seg_compare:entry''{@[stack()]=count();}''
> RM> dtrace: description
''fbt::space_map_seg_compare:entry'' matched 1 probe
> RM> ^C
> 
> 
> RM>               genunix`avl_find+0x38
> RM>               zfs`space_map_add+0x12c
> RM>               zfs`space_map_load+0x214
> RM>               zfs`metaslab_activate+0x3c
> RM>               zfs`metaslab_group_alloc+0x1b0
> RM>               zfs`metaslab_alloc_dva+0x10c
> RM>               zfs`metaslab_alloc+0x2c
> RM>               zfs`zio_dva_allocate+0x50
> RM>               zfs`zio_write_compress+0x1e4
> RM>               zfs`arc_write+0xbc
> RM>               zfs`dbuf_sync+0x6b0
> RM>               zfs`dnode_sync+0x300
> RM>               zfs`dmu_objset_sync_dnodes+0x68
> RM>               zfs`dmu_objset_sync+0x50
> RM>               zfs`dsl_dataset_sync+0xc
> RM>               zfs`dsl_pool_sync+0x60
> RM>               zfs`spa_sync+0xe0
> RM>               zfs`txg_sync_thread+0x130
> RM>               unix`thread_start+0x4
> RM>            316769
> 
> RM>               genunix`avl_find+0x38
> RM>               zfs`space_map_remove+0x98
> RM>               zfs`space_map_load+0x214
> RM>               zfs`metaslab_activate+0x3c
> RM>               zfs`metaslab_group_alloc+0x1b0
> RM>               zfs`metaslab_alloc_dva+0x10c
> RM>               zfs`metaslab_alloc+0x2c
> RM>               zfs`zio_dva_allocate+0x50
> RM>               zfs`zio_write_compress+0x1e4
> RM>               zfs`arc_write+0xbc
> RM>               zfs`dbuf_sync+0x6b0
> RM>               zfs`dnode_sync+0x300
> RM>               zfs`dmu_objset_sync_dnodes+0x68
> RM>               zfs`dmu_objset_sync+0x50
> RM>               zfs`dsl_dataset_sync+0xc
> RM>               zfs`dsl_pool_sync+0x60
> RM>               zfs`spa_sync+0xe0
> RM>               zfs`txg_sync_thread+0x130
> RM>               unix`thread_start+0x4
> RM>            404210
> RM> bash-3.00#
> 
> RM> Above was running for 3-4seconds
> 
> 
> 
> RM> I can see some progress - I mean it mounts another filesystem in
every
> RM> 3-10 minutes.
> 
> RM> However overall it could take even over an hour to boot server
before
> RM> zfs mount -a mounts actually all file systems. Also with default
> RM> Solaris configs during this stage you can''t login either
from console
> RM> or ssh and no messages are issued. That is bad also as you
don''t know
> RM> what is going on. We had to reboot again, boot to single user, run
> RM> sshd, login via ssh and let boot process to proceed in order to see
> RM> why server won''t boot.
> 
> RM> Using zpool status all pools are ok and no errors at all are
reported.
> 
> 
> RM> Summary:
> 
> RM>   1. zfs mount -a shouldn''t "sleep" 1h!
> RM>   2. one virtual CPU core of T2000 1GH is consumed for
> RM>      most of the time during zfs mount -a - why is it so CPU
> RM>      intensive? On that server CPU became limiting in that way
> RM>      that usually no IOs were issued to disks during mount.
> RM>      When it happened on faster CPU disks were doing 100MB/s,
> RM>      but still it took 10-20 minutes to zfs mount disks.
> RM>   3. I thought zfs mount problem so it exits earlier was already
> RM>      addressed. So this is completely new or still something wrong
is
> RM>      going on. I''m thinking about
> RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670
> RM>   4. Would be better if zfs mount -a would actually do it in
parallel
> RM>      as it looks like it mounts in serial fashion right now
> 
> RM> Above behavior is really BAD and it makes ZFS looks like fsck is
> RM> actually build-in - I know it isn''t.
> RM>   
> 
> 
> 
> Any ideas?
It looks to me like the mount code before replaying the log flushes all 
transactions in the group (txg_wait_synced()) to get some consistency.
This then takes a very long time - searching the space map perhaps because
it''s nearly full. Is it?

If this is true, I''m not sure what can be done - I''m not a SPA
expert.
However, a bug should be raised as this isn''t acceptable behaviour.

Neil

Robert Milkowski

2006-Aug-01 22:22 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Hello Neil,

Tuesday, August 1, 2006, 8:45:02 PM, you wrote:


NP> Robert Milkowski wrote On 08/01/06 11:41,:>> Hello Robert,
>> 
>> Monday, July 31, 2006, 12:48:30 AM, you wrote:
>> 
>> RM> Hello ZFS,
>> 
>> RM>    System was rebooted and after reboot server again
>> 
>> RM> System is snv_39, SPARC, T2000
>> 
>> RM> bash-3.00# ptree
>> RM> 7     /lib/svc/bin/svc.startd -s
>> RM>   163   /sbin/sh /lib/svc/method/fs-local
>> RM>     254   /usr/sbin/zfs mount -a
>> RM> [...]
>> 
>> RM> bash-3.00# zfs list|wc -l
>> RM>       46
>> 
>> RM> Using df I can see most file systems are already mounted.
>> 
>> RM>       
>> 
>>>>::ps!grep zfs
>> 
>> RM> R    254    163      7      7      0 0x4a004000 00000600219e1800
zfs
>> 
>>>>00000600219e1800::walk thread|::findstack -v
>> 
>> RM> stack pointer for thread 300013026a0: 2a10069ebb1
>> RM> [ 000002a10069ebb1 cv_wait+0x40() ]
>> RM>   000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d,
>> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8)
>> RM>   000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840,
>> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200)
>> RM>   000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000,
60021a8bae0, 1, 0, 0)
>> RM>   000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800,
2a10069f9d8, 60021a8bae0, 2f, 0)
>> RM>   000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0,
0, 0)
>> RM>   000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0,
ff38e40c, 100)
>> RM>   000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4,
600013fbc40, 15, 0)
>> RM>   000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100,
ff38e40c, 0, 0)
>> 
>> 
>> 
>> 
>> RM> # iostat -xnz 1
>> RM>                     extended device statistics
>> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
device
>> RM>     3.0  513.0  192.0 2822.1  0.0  2.1    0.0    4.0   0  95
>> RM> c4t600C0FF00000000009258F3E4C4C5601d0
>> RM>                     extended device statistics
>> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
device
>> RM>     7.0  598.1  388.6 1832.9  0.0  2.0    0.0    3.4   0  93
>> RM> c4t600C0FF00000000009258F3E4C4C5601d0
>> RM> ^C
>> RM> bash-3.00#
>> 
>> RM> However for many seconds no IOs at all are issued.
>> 
>> RM> # mpstat 1
>> RM> [...]
>> 
>> RM> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr
sys  wt idl
>> RM>   0    0   0   74   209  109    0    0    0    0    0     0    0
1   0  99
>> RM>   1    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>   2    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>   3    0   0    0     1    0    0    0    0    0    0     0    0
100   0   0
>> RM>   4    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>   5    0   0    0     6    1    8    0    0    0    0     0    0
0   0 100
>> RM>   6    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>   7    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>   8    3   0    1     7    0   12    0    0    2    0   245    1
1   0  98
>> RM>   9    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  10    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  11    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  12    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  13    0   0    0     6    0   10    0    0    0    0     0    0
0   0 100
>> RM>  14    0   0    0     3    0    2    1    0    0    0   188    0
1   0  99
>> RM>  15    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  16    0   0    0     8    1   12    0    0    0    0     0    0
0   0 100
>> RM>  17    0   0    0     7    0   12    0    0    0    0     0    0
0   0 100
>> RM>  18    0   0    0    23    0   43    0    0    1    0     0    0
0   0 100
>> RM>  19    0   0    0     9    0   16    0    0    0    0     0    0
0   0 100
>> RM>  20    0   0    0     8    0   14    0    0    0    0     0    0
0   0 100
>> RM>  21    0   0    0     7    2    8    0    1    0    0     1    0
1   0  99
>> RM>  22    0   0   17    38   34    7    0    1    1    0     5    0
0   0 100
>> RM>  23    0   0    0     7    0   12    0    0    0    0     0    0
0   0 100
>> RM>  24    0   0    0     4    0    6    0    1    0    0     0    0
0   0 100
>> RM>  25    0   0    1     5    0    8    0    1    0    0     0    0
0   0 100
>> RM>  26    0   0    0     2    0    2    0    0    0    0     0    0
0   0 100
>> RM>  27    0   0    0     2    0    2    0    0    0    0     0    0
0   0 100
>> RM>  28    0   0    0     2    0    2    0    0    1    0     0    0
0   0 100
>> RM>  29    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  30    0   0    0     1    0    0    0    0    0    0     0    0
0   0 100
>> RM>  31    0   0    0     6    0    8    0    0    0    0     0    0
0   0 100
>> RM> ^C
>> RM> bash-3.00#
>> 
>> RM> # dtrace -n
fbt:::entry''{self->t=timestamp;}'' -n
>> RM>
fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}''
-n tick-5s''{printa(@);exit(0);}''
>> RM> [...]
>> RM>   syscall_mstate                                            
12636328
>> RM>   callout_schedule_1                                        
14428108
>> RM>   hwblkclr                                                  
16656308
>> RM>   avl_rotation                                              
17196252
>> RM>   page_pptonum                                              
19457456
>> RM>   sfmmu_mlist_enter                                         
20078508
>> RM>   sfmmu_mlist_exit                                          
20804176
>> RM>   page_numtomemseg_nolock                                   
23377972
>> RM>   avl_insert                                                
23727512
>> RM>   avl_insert_here                                           
25281752
>> RM>   sfmmu_mlspl_enter                                         
32277524
>> RM>   kmem_cache_alloc                                          
38497240
>> RM>   xc_serv                                                   
94880184
>> RM>   disp_getwork                                             
118366960
>> RM>   disp_anywork                                             
191004096
>> RM>   space_map_seg_compare                                    
573117700
>> RM>   resume                                                  
7059674296
>> 
>> RM> #
>> 
>> RM> bash-3.00# dtrace -n
>> RM>
fbt::space_map_seg_compare:entry''{@[stack()]=count();}''
>> RM> dtrace: description
''fbt::space_map_seg_compare:entry'' matched 1 probe
>> RM> ^C
>> 
>> 
>> RM>               genunix`avl_find+0x38
>> RM>               zfs`space_map_add+0x12c
>> RM>               zfs`space_map_load+0x214
>> RM>               zfs`metaslab_activate+0x3c
>> RM>               zfs`metaslab_group_alloc+0x1b0
>> RM>               zfs`metaslab_alloc_dva+0x10c
>> RM>               zfs`metaslab_alloc+0x2c
>> RM>               zfs`zio_dva_allocate+0x50
>> RM>               zfs`zio_write_compress+0x1e4
>> RM>               zfs`arc_write+0xbc
>> RM>               zfs`dbuf_sync+0x6b0
>> RM>               zfs`dnode_sync+0x300
>> RM>               zfs`dmu_objset_sync_dnodes+0x68
>> RM>               zfs`dmu_objset_sync+0x50
>> RM>               zfs`dsl_dataset_sync+0xc
>> RM>               zfs`dsl_pool_sync+0x60
>> RM>               zfs`spa_sync+0xe0
>> RM>               zfs`txg_sync_thread+0x130
>> RM>               unix`thread_start+0x4
>> RM>            316769
>> 
>> RM>               genunix`avl_find+0x38
>> RM>               zfs`space_map_remove+0x98
>> RM>               zfs`space_map_load+0x214
>> RM>               zfs`metaslab_activate+0x3c
>> RM>               zfs`metaslab_group_alloc+0x1b0
>> RM>               zfs`metaslab_alloc_dva+0x10c
>> RM>               zfs`metaslab_alloc+0x2c
>> RM>               zfs`zio_dva_allocate+0x50
>> RM>               zfs`zio_write_compress+0x1e4
>> RM>               zfs`arc_write+0xbc
>> RM>               zfs`dbuf_sync+0x6b0
>> RM>               zfs`dnode_sync+0x300
>> RM>               zfs`dmu_objset_sync_dnodes+0x68
>> RM>               zfs`dmu_objset_sync+0x50
>> RM>               zfs`dsl_dataset_sync+0xc
>> RM>               zfs`dsl_pool_sync+0x60
>> RM>               zfs`spa_sync+0xe0
>> RM>               zfs`txg_sync_thread+0x130
>> RM>               unix`thread_start+0x4
>> RM>            404210
>> RM> bash-3.00#
>> 
>> RM> Above was running for 3-4seconds
>> 
>> 
>> 
>> RM> I can see some progress - I mean it mounts another filesystem in
every
>> RM> 3-10 minutes.
>> 
>> RM> However overall it could take even over an hour to boot server
before
>> RM> zfs mount -a mounts actually all file systems. Also with default
>> RM> Solaris configs during this stage you can''t login
either from console
>> RM> or ssh and no messages are issued. That is bad also as you
don''t know
>> RM> what is going on. We had to reboot again, boot to single user,
run
>> RM> sshd, login via ssh and let boot process to proceed in order to
see
>> RM> why server won''t boot.
>> 
>> RM> Using zpool status all pools are ok and no errors at all are
reported.
>> 
>> 
>> RM> Summary:
>> 
>> RM>   1. zfs mount -a shouldn''t "sleep" 1h!
>> RM>   2. one virtual CPU core of T2000 1GH is consumed for
>> RM>      most of the time during zfs mount -a - why is it so CPU
>> RM>      intensive? On that server CPU became limiting in that way
>> RM>      that usually no IOs were issued to disks during mount.
>> RM>      When it happened on faster CPU disks were doing 100MB/s,
>> RM>      but still it took 10-20 minutes to zfs mount disks.
>> RM>   3. I thought zfs mount problem so it exits earlier was already
>> RM>      addressed. So this is completely new or still something
wrong is
>> RM>      going on. I''m thinking about
>> RM>
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670
>> RM>   4. Would be better if zfs mount -a would actually do it in
parallel
>> RM>      as it looks like it mounts in serial fashion right now
>> 
>> RM> Above behavior is really BAD and it makes ZFS looks like fsck is
>> RM> actually build-in - I know it isn''t.
>> RM>   
>> 
>> 
>> 
>> Any ideas?
NP> It looks to me like the mount code before replaying the log flushes all
NP> transactions in the group (txg_wait_synced()) to get some consistency.
NP> This then takes a very long time - searching the space map perhaps
because
NP> it''s nearly full. Is it?

More than 200GB free in a pool of about 4.5TB.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Aug-01 23:30 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Hello Robert,

Wednesday, August 2, 2006, 12:22:11 AM, you wrote:

RM> Hello Neil,

RM> Tuesday, August 1, 2006, 8:45:02 PM, you wrote:


NP>> Robert Milkowski wrote On 08/01/06 11:41,:>>> Hello Robert,
>>> 
>>> Monday, July 31, 2006, 12:48:30 AM, you wrote:
>>> 
>>> RM> Hello ZFS,
>>> 
>>> RM>    System was rebooted and after reboot server again
>>> 
>>> RM> System is snv_39, SPARC, T2000
>>> 
>>> RM> bash-3.00# ptree
>>> RM> 7     /lib/svc/bin/svc.startd -s
>>> RM>   163   /sbin/sh /lib/svc/method/fs-local
>>> RM>     254   /usr/sbin/zfs mount -a
>>> RM> [...]
>>> 
>>> RM> bash-3.00# zfs list|wc -l
>>> RM>       46
>>> 
>>> RM> Using df I can see most file systems are already mounted.
>>> 
>>> RM>       
>>> 
>>>>>::ps!grep zfs
>>> 
>>> RM> R    254    163      7      7      0 0x4a004000
00000600219e1800 zfs
>>> 
>>>>>00000600219e1800::walk thread|::findstack -v
>>> 
>>> RM> stack pointer for thread 300013026a0: 2a10069ebb1
>>> RM> [ 000002a10069ebb1 cv_wait+0x40() ]
>>> RM>   000002a10069ec61 txg_wait_synced+0x54(3000052f0d0,
2f2216d,
>>> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8)
>>> RM>   000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840,
>>> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200)
>>> RM>   000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000,
60021a8bae0, 1, 0, 0)
>>> RM>   000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800,
2a10069f9d8, 60021a8bae0, 2f, 0)
>>> RM>   000002a10069efb1 domount+0x9b0(2a10069f8b0, 1,
600225f6080, 0, 0, 0)
>>> RM>   000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0,
0, ff38e40c, 100)
>>> RM>   000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4,
600013fbc40, 15, 0)
>>> RM>   000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0,
100, ff38e40c, 0, 0)
>>> 
>>> 
>>> 
>>> 
>>> RM> # iostat -xnz 1
>>> RM>                     extended device statistics
>>> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
device
>>> RM>     3.0  513.0  192.0 2822.1  0.0  2.1    0.0    4.0   0  95
>>> RM> c4t600C0FF00000000009258F3E4C4C5601d0
>>> RM>                     extended device statistics
>>> RM>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
device
>>> RM>     7.0  598.1  388.6 1832.9  0.0  2.0    0.0    3.4   0  93
>>> RM> c4t600C0FF00000000009258F3E4C4C5601d0
>>> RM> ^C
>>> RM> bash-3.00#
>>> 
>>> RM> However for many seconds no IOs at all are issued.
>>> 
>>> RM> # mpstat 1
>>> RM> [...]
>>> 
>>> RM> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl 
usr sys  wt idl
>>> RM>   0    0   0   74   209  109    0    0    0    0    0     0 
0  1   0  99
>>> RM>   1    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>   2    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>   3    0   0    0     1    0    0    0    0    0    0     0 
0 100   0   0
>>> RM>   4    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>   5    0   0    0     6    1    8    0    0    0    0     0 
0  0   0 100
>>> RM>   6    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>   7    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>   8    3   0    1     7    0   12    0    0    2    0   245 
1  1   0  98
>>> RM>   9    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  10    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  11    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  12    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  13    0   0    0     6    0   10    0    0    0    0     0 
0  0   0 100
>>> RM>  14    0   0    0     3    0    2    1    0    0    0   188 
0  1   0  99
>>> RM>  15    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  16    0   0    0     8    1   12    0    0    0    0     0 
0  0   0 100
>>> RM>  17    0   0    0     7    0   12    0    0    0    0     0 
0  0   0 100
>>> RM>  18    0   0    0    23    0   43    0    0    1    0     0 
0  0   0 100
>>> RM>  19    0   0    0     9    0   16    0    0    0    0     0 
0  0   0 100
>>> RM>  20    0   0    0     8    0   14    0    0    0    0     0 
0  0   0 100
>>> RM>  21    0   0    0     7    2    8    0    1    0    0     1 
0  1   0  99
>>> RM>  22    0   0   17    38   34    7    0    1    1    0     5 
0  0   0 100
>>> RM>  23    0   0    0     7    0   12    0    0    0    0     0 
0  0   0 100
>>> RM>  24    0   0    0     4    0    6    0    1    0    0     0 
0  0   0 100
>>> RM>  25    0   0    1     5    0    8    0    1    0    0     0 
0  0   0 100
>>> RM>  26    0   0    0     2    0    2    0    0    0    0     0 
0  0   0 100
>>> RM>  27    0   0    0     2    0    2    0    0    0    0     0 
0  0   0 100
>>> RM>  28    0   0    0     2    0    2    0    0    1    0     0 
0  0   0 100
>>> RM>  29    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  30    0   0    0     1    0    0    0    0    0    0     0 
0  0   0 100
>>> RM>  31    0   0    0     6    0    8    0    0    0    0     0 
0  0   0 100
>>> RM> ^C
>>> RM> bash-3.00#
>>> 
>>> RM> # dtrace -n
fbt:::entry''{self->t=timestamp;}'' -n
>>> RM>
fbt:::return''/self->t/{@[probefunc]=sum(timestamp-self->t);self->t=0;}''
-n tick-5s''{printa(@);exit(0);}''
>>> RM> [...]
>>> RM>   syscall_mstate                                            
12636328
>>> RM>   callout_schedule_1                                        
14428108
>>> RM>   hwblkclr                                                  
16656308
>>> RM>   avl_rotation                                              
17196252
>>> RM>   page_pptonum                                              
19457456
>>> RM>   sfmmu_mlist_enter                                         
20078508
>>> RM>   sfmmu_mlist_exit                                          
20804176
>>> RM>   page_numtomemseg_nolock                                   
23377972
>>> RM>   avl_insert                                                
23727512
>>> RM>   avl_insert_here                                           
25281752
>>> RM>   sfmmu_mlspl_enter                                         
32277524
>>> RM>   kmem_cache_alloc                                          
38497240
>>> RM>   xc_serv                                                   
94880184
>>> RM>   disp_getwork                                             
118366960
>>> RM>   disp_anywork                                             
191004096
>>> RM>   space_map_seg_compare                                    
573117700
>>> RM>   resume                                                  
7059674296
>>> 
>>> RM> #
>>> 
>>> RM> bash-3.00# dtrace -n
>>> RM>
fbt::space_map_seg_compare:entry''{@[stack()]=count();}''
>>> RM> dtrace: description
''fbt::space_map_seg_compare:entry'' matched 1 probe
>>> RM> ^C
>>> 
>>> 
>>> RM>               genunix`avl_find+0x38
>>> RM>               zfs`space_map_add+0x12c
>>> RM>               zfs`space_map_load+0x214
>>> RM>               zfs`metaslab_activate+0x3c
>>> RM>               zfs`metaslab_group_alloc+0x1b0
>>> RM>               zfs`metaslab_alloc_dva+0x10c
>>> RM>               zfs`metaslab_alloc+0x2c
>>> RM>               zfs`zio_dva_allocate+0x50
>>> RM>               zfs`zio_write_compress+0x1e4
>>> RM>               zfs`arc_write+0xbc
>>> RM>               zfs`dbuf_sync+0x6b0
>>> RM>               zfs`dnode_sync+0x300
>>> RM>               zfs`dmu_objset_sync_dnodes+0x68
>>> RM>               zfs`dmu_objset_sync+0x50
>>> RM>               zfs`dsl_dataset_sync+0xc
>>> RM>               zfs`dsl_pool_sync+0x60
>>> RM>               zfs`spa_sync+0xe0
>>> RM>               zfs`txg_sync_thread+0x130
>>> RM>               unix`thread_start+0x4
>>> RM>            316769
>>> 
>>> RM>               genunix`avl_find+0x38
>>> RM>               zfs`space_map_remove+0x98
>>> RM>               zfs`space_map_load+0x214
>>> RM>               zfs`metaslab_activate+0x3c
>>> RM>               zfs`metaslab_group_alloc+0x1b0
>>> RM>               zfs`metaslab_alloc_dva+0x10c
>>> RM>               zfs`metaslab_alloc+0x2c
>>> RM>               zfs`zio_dva_allocate+0x50
>>> RM>               zfs`zio_write_compress+0x1e4
>>> RM>               zfs`arc_write+0xbc
>>> RM>               zfs`dbuf_sync+0x6b0
>>> RM>               zfs`dnode_sync+0x300
>>> RM>               zfs`dmu_objset_sync_dnodes+0x68
>>> RM>               zfs`dmu_objset_sync+0x50
>>> RM>               zfs`dsl_dataset_sync+0xc
>>> RM>               zfs`dsl_pool_sync+0x60
>>> RM>               zfs`spa_sync+0xe0
>>> RM>               zfs`txg_sync_thread+0x130
>>> RM>               unix`thread_start+0x4
>>> RM>            404210
>>> RM> bash-3.00#
>>> 
>>> RM> Above was running for 3-4seconds
>>> 
>>> 
>>> 
>>> RM> I can see some progress - I mean it mounts another
filesystem in every
>>> RM> 3-10 minutes.
>>> 
>>> RM> However overall it could take even over an hour to boot
server before
>>> RM> zfs mount -a mounts actually all file systems. Also with
default
>>> RM> Solaris configs during this stage you can''t login
either from console
>>> RM> or ssh and no messages are issued. That is bad also as you
don''t know
>>> RM> what is going on. We had to reboot again, boot to single
user, run
>>> RM> sshd, login via ssh and let boot process to proceed in order
to see
>>> RM> why server won''t boot.
>>> 
>>> RM> Using zpool status all pools are ok and no errors at all are
reported.
>>> 
>>> 
>>> RM> Summary:
>>> 
>>> RM>   1. zfs mount -a shouldn''t "sleep" 1h!
>>> RM>   2. one virtual CPU core of T2000 1GH is consumed for
>>> RM>      most of the time during zfs mount -a - why is it so CPU
>>> RM>      intensive? On that server CPU became limiting in that
way
>>> RM>      that usually no IOs were issued to disks during mount.
>>> RM>      When it happened on faster CPU disks were doing
100MB/s,
>>> RM>      but still it took 10-20 minutes to zfs mount disks.
>>> RM>   3. I thought zfs mount problem so it exits earlier was
already
>>> RM>      addressed. So this is completely new or still something
wrong is
>>> RM>      going on. I''m thinking about
>>> RM>
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670
>>> RM>   4. Would be better if zfs mount -a would actually do it in
parallel
>>> RM>      as it looks like it mounts in serial fashion right now
>>> 
>>> RM> Above behavior is really BAD and it makes ZFS looks like
fsck is
>>> RM> actually build-in - I know it isn''t.
>>> RM>   
>>> 
>>> 
>>> 
>>> Any ideas?
NP>> It looks to me like the mount code before replaying the log flushes
all
NP>> transactions in the group (txg_wait_synced()) to get some
consistency.
NP>> This then takes a very long time - searching the space map perhaps
because
NP>> it''s nearly full. Is it?

RM> More than 200GB free in a pool of about 4.5TB.


I run ''zfs set quota=400g nfs-s5-s8/d5201 nfs-s5-s8/d5202
nfs-s5-s8/d5203 nfs-s5-s8/d5'' and it''s still running after 30
minutes.

# zpool iostat nfs-s5-s8 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
nfs-s5-s8   4.24T   302G    243     25  13.1M   223K
nfs-s5-s8   4.24T   302G    409      0  15.0M      0
nfs-s5-s8   4.24T   302G    162      0  8.95M      0
nfs-s5-s8   4.24T   302G    317      0  18.9M      0
nfs-s5-s8   4.24T   302G    298     30  24.0M  90.0K
nfs-s5-s8   4.24T   302G    374      0  26.9M      0


# mdb -kw
Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch md ip sctp
usba lofs logindmux ptm qlc fctl cpc fcp fcip random crypto zfs ssd nfs
]> ::ps!grep zfsR  23426   2460  23426   2454      0 0x4a004000 0000060002a66790
zfs> 0000060002a66790::walk thread|::findstack -vstack pointer for thread 3006703b320: 2a10af5ccb1
[ 000002a10af5ccb1 cv_wait+0x40() ]
  000002a10af5cd61 txg_wait_synced+0x54(30055528990, 2f23aa5, 2f23aa3,
300555289d0, 300555289d2, 30055528988)
  000002a10af5ce11 dsl_sync_task_group_wait+0x120(300c0860360, 0, 2f23aa5, 71,
7b64ecf8, 300c0860388)
  000002a10af5cec1 dsl_sync_task_do+0x28(300c0860360, 7b611dcc, 7b611e78,
60009e5e380, 2a10af5d828, 0)
  000002a10af5cf71 dsl_dir_set_quota+0x64(7b648400, 6400000000, 7b611c00,
7b611c00, 60009e5e380, 0)
  000002a10af5d031 zfsdev_ioctl+0x158(701d8c00, 3f, ffbfec18, 15, 54,
600027aa000)
  000002a10af5d0e1 fop_ioctl+0x20(600026dd280, 5a15, ffbfec18, 100003,
6001cacde58, 120e894)
  000002a10af5d191 ioctl+0x184(3, 60011c64ab8, ffbfec18, b614c, 1010101, 5a15)
  000002a10af5d2e1 syscall_trap32+0xcc(3, 5a15, ffbfec18, b614c, 1010101,
fffffd74)


Setting quota on file systems in other pools takes something like 10s
which still is quite long but not 30 minutes (or more)! The only
difference I see is that nfs-s5-s8 has less free space than other
pools.

This is snv_43 as I reimported pool on another server.


Something is terribly wrong here.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Nov-10 01:10 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Hello Neil,

I can see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6478388
integrated. I guess it could be related to problem I described here,
right?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Neil Perrin

2006-Nov-10 04:36 UTC

head link

[zfs-discuss] zfs mount stuck in zil_replay

Hi Robert,

Yes, it could be related, or even the bug. Certainly the replay
was (prior to this bug fix) extremely slow.  I don''t really have enough
information to determine if it''s the exact problem, though after
re-reading your original post I strongly suspect it is.

I also putback a companion fix which should be helpful in determining
if zil_replay is making progress, or hung:

6486496 zil_replay() useful debug

Neil.

Robert Milkowski wrote On 11/09/06 18:10,:
>Hello Neil,
>
>I can see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6478388
>integrated. I guess it could be related to problem I described here,
>right?
>
>  
>

Seemingly Similar Threads

Search for more possibly parallel threads

zfs discuss - Jul 2006 - zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

[zfs-discuss] zfs mount stuck in zil_replay

Seemingly Similar Threads