thr3ads.net - zfs discuss - [zfs-discuss] ZFS hang [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2006-Jun-12 12:24 UTC

[zfs-discuss] ZFS hang

Hi.

  snv_39, SPARC - nfs server with local ZFS filesystems.
Under heavy load traffic to all filesystems in one pool ceased - it was ok for
other pools.
By ceased I mean that ''zpool iostat 1'' showed no traffic to
that pool (nfs-s5-p0).

Commands like ''df'' or ''zfs list'' hang.
I issued ''reboot -k'' but it didn''t worked, neither
''halt'' command.
So I issued sync from OBP - after restart server started ok and is working
properly so far.
I have crashdump (there''re zfs list, df command hand).
>From a crashdump:
> ::psS    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R      0      0      0      0      0 0x00000001 0000000001836cc0 sched
R      3      0      0      0      0 0x00020001 0000060000dedb90 fsflush
R      2      0      0      0      0 0x00020001 0000060000dee778 pageout
R      1      0      0      0      0 0x4a004000 0000060000def360 init
R   3054      1   3054   3048      0 0x4a014000 00000600127f7008 bash
R   3070   3054   3070   3048      0 0x4a004000 000006000284dba8 reboot
R   3013      1   3013   3007      0 0x4a014000 0000060002b5fbb8 bash
R   3038   3013   3038   3007      0 0x4a004000 00000600127f4c50 sync
R   3015   3013   3015   3007      0 0x4a004000 0000060002b5cc18 sync
R   2995      1   2995   2989      0 0x4a014000 0000060002a32798 bash
R   2997   2995   2997   2989      0 0x4a004000 0000060002b5c030 zfs
R    367      1    367    361      0 0x4a014000 0000060002a2ec10 bash
R   2970    367   2970    361      0 0x4a004000 00000600127f5838 df
R   2143      1   2143   2143      1 0x42300002 00000600127f93c0 nfsd
R    357      1    356    356      0 0x42000000 0000060000f0abf8 snmpd
R    296      1    296    296      0 0x42000000 00000600025eac00 mdmonitord
R    228      1    228    228      0 0x42000000 0000060000f0cfb0 inetd
Z    311    228    228    228      0 0x4a004002 0000060002a30fc8 rpc.metad
R      7      1      7      7      0 0x42000000 0000060000dec3c0 svc.startd
R    237      7    237    237      0 0x4a004000 000006000284e790 sh
R   3077    237   3077    237      0 0x4a014000 000006000284b7f0 bash
R   3084   3077   3084    237      0 0x4a004000 0000060002b5e3e8 halt
R   3083   3077   3083    237      0 0x4a004000 0000060009355000 sync
Z    221      7    221    221      0 0x4a014002 00000600025eb7e8
sac> 00000600127f5838::walk thread|::findstack -vstack pointer for thread 300a12b5020: 2a104880841
[ 000002a104880841 cv_wait+0x40() ]
  000002a1048808f1 zio_wait+0x30(300bbe45900, 300bbe45900, 300bbe45b68,
300bbe45b60, 0, 11)
  000002a1048809a1 dmu_buf_hold+0x84(0, 0, 5, 0, 2a104881318, 0)
  000002a104880a61 zap_lockdir+0x18(60003127468, 3, 0, 1, 1, 2a104881638)
  000002a104880b21 zap_cursor_retrieve+0x44(2a104881630, 2a104881518, 3, 0,
2a104881630, 2)
  000002a104880c41 dsl_prop_get_all+0xf4(3002bc6ef70, 2a104881820, 1,
60002a3f8c0, 6001bb77540, 7b244c2c)
  000002a104880f61 zfs_ioc_objset_stats+0x84(60003def000, 0, 0, 60003defb60,
198, 7007ef08)
  000002a104881031 zfsdev_ioctl+0x158(7007ec00, 33, ffbfdc00, 11, 44,
60003def000)
  000002a1048810e1 fop_ioctl+0x20(300adb49ec0, 5a11, ffbfdc00, 100003,
60000c02798, 120c888)
  000002a104881191 ioctl+0x184(3, 6001e2b5118, ffbfdc00, ffffffff, 40490, 5a11)
  000002a1048812e1 syscall_trap32+0xcc(3, 5a11, ffbfdc00, ffffffff, 40490,
80808080)>
> 0000060002b5c030::walk thread|::findstack -vstack pointer for thread 30045016380: 2a102e9a841
[ 000002a102e9a841 cv_wait+0x40() ]
  000002a102e9a8f1 dbuf_read+0x1ac(3000f8e1dc0, 2, 3000f8e1e38, 3000f8e1dc0, 0,
2)
  000002a102e9a9a1 dmu_buf_hold+0x84(0, 0, 5, 0, 2a102e9b318, 0)
  000002a102e9aa61 zap_lockdir+0x18(60003127468, 3, 0, 1, 1, 2a102e9b638)
  000002a102e9ab21 zap_cursor_retrieve+0x44(2a102e9b630, 2a102e9b518, 3, 0,
2a102e9b630, 2)
  000002a102e9ac41 dsl_prop_get_all+0xf4(3002bc6fdc0, 2a102e9b820, 1,
60002a3f8c0, 6001bb77540, 7b244c2c)
  000002a102e9af61 zfs_ioc_objset_stats+0x84(30052b4e000, 0, 0, 30052b4eb60,
198, 7007ef08)
  000002a102e9b031 zfsdev_ioctl+0x158(7007ec00, 33, ffbfde20, 11, 44,
30052b4e000)
  000002a102e9b0e1 fop_ioctl+0x20(300adb49ec0, 5a11, ffbfde20, 100003,
300a7c3ca60, 120c888)
  000002a102e9b191 ioctl+0x184(4, 60000db84a0, ffbfde20, 4, 40490, 5a11)
  000002a102e9b2e1 syscall_trap32+0xcc(4, 5a11, ffbfde20, 4, 40490,
80808080)>

Looks like some kind of deadlock???

If crashdump is needed I can provided it - but off the list and not for public
eyes.
 
 
This message posted from opensolaris.org

James C. McPherson

2006-Jun-12 13:03 UTC

head link

[zfs-discuss] ZFS hang

Robert Milkowski wrote:>   snv_39, SPARC - nfs server with local ZFS filesystems.
> Under heavy load traffic to all filesystems in one pool ceased - it was ok
for other pools.
> By ceased I mean that ''zpool iostat 1'' showed no traffic
to that pool (nfs-s5-p0).
> 
> Commands like ''df'' or ''zfs list'' hang.
> I issued ''reboot -k'' but it didn''t worked,
neither ''halt'' command.
> So I issued sync from OBP - after restart server started ok and is working
properly so far.
> I have crashdump (there''re zfs list, df command hand).
...> Looks like some kind of deadlock???

Hi Robert,
At first glance this looks like it could be

6430121 3-way deadlock involving tc_lock within zfs

We''ll need the core to know more precisely.


cheers,
James C. McPherson
--
Solaris Datapath Engineering
Data Management Group
Sun Microsystems

zfs discuss - Jun 2006 - ZFS hang

[zfs-discuss] ZFS hang

[zfs-discuss] ZFS hang