I have an interesting issue with one single ZFS filesystem in a pool. All the other filesystems are fine, and can be mounted, snapshoted, destroyed, etc. But this one filesystem, if I try to do any operation on it (zfs mount, zfs snapshot, zfs destroy, zfs set <anything>), it spins the system until all RAM is used up (wired), and then hangs the box. The zfs process sits in tx -> tx_sync_done_cv state until the box locks up. CTRL+T of the process only ever shows this: load: 0.46 cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 2440k Anyone come across anything similar? And found a way to fix it, or to destroy the filesystem? Any suggestions on how to go about debugging this? Any magical zdb commands to use? The filesystem only has 5 MB of data in it (log files), compressed via LZJB for a compressratio of ~6x. There are no snapshots for this filesystem. Dedupe is enabled on the pool and all filesystems. System is running 64-bit FreeBSD 9.0: FreeBSD alphadrive.sd73.bc.ca 9.0-RELEASE FreeBSD 9.0-RELEASE #0 r229803: Sun Jan 8 00:43:00 PST 2012 root at alphadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST90 amd64 Hardware is fairly generic: - SuperMicro H8DGi-F motherboard - AMD Opteron 6128 CPU (8 cores) - 24 GB of DDR3 RAM - 3x SuperMicro AOC-USAS-L8i SATA controllers - 24x harddrives ranging from 500 GB to 2.0 TB (6 of each kind in raidz2 vdevs) - 64 GB SSD partitioned for OS, swap, with 32 GB for L2ARC Filesystem properties: # zfs get all storage/logs/rsync NAME PROPERTY VALUE SOURCE storage/logs/rsync type filesystem - storage/logs/rsync creation Tue May 10 9:55 2011 - storage/logs/rsync used 5.48M - storage/logs/rsync available 4.61T - storage/logs/rsync referenced 5.48M - storage/logs/rsync compressratio 5.93x - storage/logs/rsync mounted no - storage/logs/rsync quota none default storage/logs/rsync reservation none default storage/logs/rsync recordsize 128K default storage/logs/rsync mountpoint /var/log/rsync local storage/logs/rsync sharenfs off default storage/logs/rsync checksum sha256 inherited from storage storage/logs/rsync compression lzjb inherited from storage storage/logs/rsync atime off inherited from storage storage/logs/rsync devices on default storage/logs/rsync exec on default storage/logs/rsync setuid on default storage/logs/rsync readonly off default storage/logs/rsync jailed off default storage/logs/rsync snapdir visible inherited from storage storage/logs/rsync aclmode discard default storage/logs/rsync aclinherit restricted default storage/logs/rsync canmount on default storage/logs/rsync xattr on default storage/logs/rsync copies 1 default storage/logs/rsync version 5 - storage/logs/rsync utf8only off - storage/logs/rsync normalization none - storage/logs/rsync casesensitivity sensitive - storage/logs/rsync vscan off default storage/logs/rsync nbmand off default storage/logs/rsync sharesmb off default storage/logs/rsync refquota none default storage/logs/rsync refreservation none default storage/logs/rsync primarycache all inherited from storage storage/logs/rsync secondarycache metadata inherited from storage storage/logs/rsync usedbysnapshots 0 - storage/logs/rsync usedbydataset 5.48M - storage/logs/rsync usedbychildren 0 - storage/logs/rsync usedbyrefreservation 0 - storage/logs/rsync logbias latency default storage/logs/rsync dedup sha256 inherited from storage storage/logs/rsync mlslabel - storage/logs/rsync sync standard default storage/logs/rsync refcompressratio 5.93x -- Freddie Cash fjwcash at gmail.com
On Tue, May 8, 2012 at 10:24 AM, Freddie Cash <fjwcash at gmail.com> wrote:> I have an interesting issue with one single ZFS filesystem in a pool. > All the other filesystems are fine, and can be mounted, snapshoted, > destroyed, etc. ?But this one filesystem, if I try to do any operation > on it (zfs mount, zfs snapshot, zfs destroy, zfs set <anything>), it > spins the system until all RAM is used up (wired), and then hangs the > box. ?The zfs process sits in tx -> tx_sync_done_cv state until the > box locks up. ?CTRL+T of the process only ever shows this: > ? ?load: 0.46 ?cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 2440k > > Anyone come across anything similar? ?And found a way to fix it, or to > destroy the filesystem? ?Any suggestions on how to go about debugging > this? ?Any magical zdb commands to use? > > The filesystem only has 5 MB of data in it (log files), compressed via > LZJB for a compressratio of ~6x. ?There are no snapshots for this > filesystem. > > Dedupe is enabled on the pool and all filesystems.After more fiddling, testing, and experimenting, it all came down to not enough RAM in the box to mount the 5 MB filesystem. After installing an extra 8 GB of RAM (32 GB total), everything mounted correctly. Took 27 GB of wired kernel memory (guessing ARC space) to do it. Unmount, mount, export, import, change properties all completed successfully. And the box is running correctly with 24 GB of RAM again. We''ll be ordering more RAM for our ZFS boxes, now. :) -- Freddie Cash fjwcash at gmail.com