Hi.
snv_39, SPARC - nfs server with local ZFS filesystems.
Under heavy load traffic to all filesystems in one pool ceased - it was ok for
other pools.
By ceased I mean that ''zpool iostat 1'' showed no traffic to
that pool (nfs-s5-p0).
Commands like ''df'' or ''zfs list'' hang.
I issued ''reboot -k'' but it didn''t worked, neither
''halt'' command.
So I issued sync from OBP - after restart server started ok and is working
properly so far.
I have crashdump (there''re zfs list, df command hand).
>From a crashdump:
> ::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 0 0 0 0 0 0x00000001 0000000001836cc0 sched
R 3 0 0 0 0 0x00020001 0000060000dedb90 fsflush
R 2 0 0 0 0 0x00020001 0000060000dee778 pageout
R 1 0 0 0 0 0x4a004000 0000060000def360 init
R 3054 1 3054 3048 0 0x4a014000 00000600127f7008 bash
R 3070 3054 3070 3048 0 0x4a004000 000006000284dba8 reboot
R 3013 1 3013 3007 0 0x4a014000 0000060002b5fbb8 bash
R 3038 3013 3038 3007 0 0x4a004000 00000600127f4c50 sync
R 3015 3013 3015 3007 0 0x4a004000 0000060002b5cc18 sync
R 2995 1 2995 2989 0 0x4a014000 0000060002a32798 bash
R 2997 2995 2997 2989 0 0x4a004000 0000060002b5c030 zfs
R 367 1 367 361 0 0x4a014000 0000060002a2ec10 bash
R 2970 367 2970 361 0 0x4a004000 00000600127f5838 df
R 2143 1 2143 2143 1 0x42300002 00000600127f93c0 nfsd
R 357 1 356 356 0 0x42000000 0000060000f0abf8 snmpd
R 296 1 296 296 0 0x42000000 00000600025eac00 mdmonitord
R 228 1 228 228 0 0x42000000 0000060000f0cfb0 inetd
Z 311 228 228 228 0 0x4a004002 0000060002a30fc8 rpc.metad
R 7 1 7 7 0 0x42000000 0000060000dec3c0 svc.startd
R 237 7 237 237 0 0x4a004000 000006000284e790 sh
R 3077 237 3077 237 0 0x4a014000 000006000284b7f0 bash
R 3084 3077 3084 237 0 0x4a004000 0000060002b5e3e8 halt
R 3083 3077 3083 237 0 0x4a004000 0000060009355000 sync
Z 221 7 221 221 0 0x4a014002 00000600025eb7e8
sac> 00000600127f5838::walk thread|::findstack -v
stack pointer for thread 300a12b5020: 2a104880841
[ 000002a104880841 cv_wait+0x40() ]
000002a1048808f1 zio_wait+0x30(300bbe45900, 300bbe45900, 300bbe45b68,
300bbe45b60, 0, 11)
000002a1048809a1 dmu_buf_hold+0x84(0, 0, 5, 0, 2a104881318, 0)
000002a104880a61 zap_lockdir+0x18(60003127468, 3, 0, 1, 1, 2a104881638)
000002a104880b21 zap_cursor_retrieve+0x44(2a104881630, 2a104881518, 3, 0,
2a104881630, 2)
000002a104880c41 dsl_prop_get_all+0xf4(3002bc6ef70, 2a104881820, 1,
60002a3f8c0, 6001bb77540, 7b244c2c)
000002a104880f61 zfs_ioc_objset_stats+0x84(60003def000, 0, 0, 60003defb60,
198, 7007ef08)
000002a104881031 zfsdev_ioctl+0x158(7007ec00, 33, ffbfdc00, 11, 44,
60003def000)
000002a1048810e1 fop_ioctl+0x20(300adb49ec0, 5a11, ffbfdc00, 100003,
60000c02798, 120c888)
000002a104881191 ioctl+0x184(3, 6001e2b5118, ffbfdc00, ffffffff, 40490, 5a11)
000002a1048812e1 syscall_trap32+0xcc(3, 5a11, ffbfdc00, ffffffff, 40490,
80808080)>
> 0000060002b5c030::walk thread|::findstack -v
stack pointer for thread 30045016380: 2a102e9a841
[ 000002a102e9a841 cv_wait+0x40() ]
000002a102e9a8f1 dbuf_read+0x1ac(3000f8e1dc0, 2, 3000f8e1e38, 3000f8e1dc0, 0,
2)
000002a102e9a9a1 dmu_buf_hold+0x84(0, 0, 5, 0, 2a102e9b318, 0)
000002a102e9aa61 zap_lockdir+0x18(60003127468, 3, 0, 1, 1, 2a102e9b638)
000002a102e9ab21 zap_cursor_retrieve+0x44(2a102e9b630, 2a102e9b518, 3, 0,
2a102e9b630, 2)
000002a102e9ac41 dsl_prop_get_all+0xf4(3002bc6fdc0, 2a102e9b820, 1,
60002a3f8c0, 6001bb77540, 7b244c2c)
000002a102e9af61 zfs_ioc_objset_stats+0x84(30052b4e000, 0, 0, 30052b4eb60,
198, 7007ef08)
000002a102e9b031 zfsdev_ioctl+0x158(7007ec00, 33, ffbfde20, 11, 44,
30052b4e000)
000002a102e9b0e1 fop_ioctl+0x20(300adb49ec0, 5a11, ffbfde20, 100003,
300a7c3ca60, 120c888)
000002a102e9b191 ioctl+0x184(4, 60000db84a0, ffbfde20, 4, 40490, 5a11)
000002a102e9b2e1 syscall_trap32+0xcc(4, 5a11, ffbfde20, 4, 40490,
80808080)>
Looks like some kind of deadlock???
If crashdump is needed I can provided it - but off the list and not for public
eyes.
This message posted from opensolaris.org