Hello all, I have a situation where zpool status shows no known data errors but all processes on a specific filesystem are hung. This has happened 2 times before since we installed Opensolaris 2009.06 snv_111b. For instance there are two files systems in this pool ''zfs get all'' on one filesystem returns with out issue when ran on the other filesystem it hangs. Also a ''df -h'' hangs, etc. This file system has many different operation running on it; 1) It receives incremental snapshot every 30 minutes continuously. 2) every night a clone is made from one of the received snapshot streams then a filesystem backup is taken on that clone (the backup is a directory traversal) once the backup completes the clone is destroyed. We tried to upgrade to the latest build but ran in to the current ''check sum'' issue in build snv_122 so we rolled back. # uname -a SunOS lahar2 5.11 snv_111b i86pc i386 i86pc # zpool status zdisk1 pool: zdisk1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zdisk1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares c7t6d0 AVAIL errors: No known data errors The filesystem is currently in this ''hung'' state, is there any commands I can run to help debug the issue? TIA -- This message posted from opensolaris.org
Marc Emmerson
2009-Sep-14 14:59 UTC
[zfs-discuss] zpool status OK but zfs filesystem seems hung
Hi there, I wonder if your issue is related to mine, see thread here: http://opensolaris.org/jive/thread.jspa?threadID=112777&tstart=0 It only manifested after I upgraded to snv121, although booting back to 118 did not fix it. -- This message posted from opensolaris.org
Thanks for the reply but this seems to be a bit different. a couple of things I failed to mention; 1) this is a secondary pool and not the root pool. 2) the snapshot are trimmed to only keep 80 or so. The system boots and runs fine. It''s just an issue for this secondary pool and filesystem. It seems to be directly related to I/O intensive operations as the (full) backup seems to trigger it, never seen it happen with incremental backups... Thanks. -- This message posted from opensolaris.org