I have an interesting issue with one single ZFS filesystem in a pool.
All the other filesystems are fine, and can be mounted, snapshoted,
destroyed, etc. But this one filesystem, if I try to do any operation
on it (zfs mount, zfs snapshot, zfs destroy, zfs set <anything>), it
spins the system until all RAM is used up (wired), and then hangs the
box. The zfs process sits in tx -> tx_sync_done_cv state until the
box locks up. CTRL+T of the process only ever shows this:
load: 0.46 cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0%
2440k
Anyone come across anything similar? And found a way to fix it, or to
destroy the filesystem? Any suggestions on how to go about debugging
this? Any magical zdb commands to use?
The filesystem only has 5 MB of data in it (log files), compressed via
LZJB for a compressratio of ~6x. There are no snapshots for this
filesystem.
Dedupe is enabled on the pool and all filesystems.
System is running 64-bit FreeBSD 9.0:
FreeBSD alphadrive.sd73.bc.ca 9.0-RELEASE FreeBSD 9.0-RELEASE #0
r229803: Sun Jan 8 00:43:00 PST 2012
root at alphadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST90 amd64
Hardware is fairly generic:
- SuperMicro H8DGi-F motherboard
- AMD Opteron 6128 CPU (8 cores)
- 24 GB of DDR3 RAM
- 3x SuperMicro AOC-USAS-L8i SATA controllers
- 24x harddrives ranging from 500 GB to 2.0 TB (6 of each kind in
raidz2 vdevs)
- 64 GB SSD partitioned for OS, swap, with 32 GB for L2ARC
Filesystem properties:
# zfs get all storage/logs/rsync
NAME PROPERTY VALUE SOURCE
storage/logs/rsync type filesystem -
storage/logs/rsync creation Tue May 10 9:55 2011 -
storage/logs/rsync used 5.48M -
storage/logs/rsync available 4.61T -
storage/logs/rsync referenced 5.48M -
storage/logs/rsync compressratio 5.93x -
storage/logs/rsync mounted no -
storage/logs/rsync quota none default
storage/logs/rsync reservation none default
storage/logs/rsync recordsize 128K default
storage/logs/rsync mountpoint /var/log/rsync local
storage/logs/rsync sharenfs off default
storage/logs/rsync checksum sha256
inherited from storage
storage/logs/rsync compression lzjb
inherited from storage
storage/logs/rsync atime off
inherited from storage
storage/logs/rsync devices on default
storage/logs/rsync exec on default
storage/logs/rsync setuid on default
storage/logs/rsync readonly off default
storage/logs/rsync jailed off default
storage/logs/rsync snapdir visible
inherited from storage
storage/logs/rsync aclmode discard default
storage/logs/rsync aclinherit restricted default
storage/logs/rsync canmount on default
storage/logs/rsync xattr on default
storage/logs/rsync copies 1 default
storage/logs/rsync version 5 -
storage/logs/rsync utf8only off -
storage/logs/rsync normalization none -
storage/logs/rsync casesensitivity sensitive -
storage/logs/rsync vscan off default
storage/logs/rsync nbmand off default
storage/logs/rsync sharesmb off default
storage/logs/rsync refquota none default
storage/logs/rsync refreservation none default
storage/logs/rsync primarycache all
inherited from storage
storage/logs/rsync secondarycache metadata
inherited from storage
storage/logs/rsync usedbysnapshots 0 -
storage/logs/rsync usedbydataset 5.48M -
storage/logs/rsync usedbychildren 0 -
storage/logs/rsync usedbyrefreservation 0 -
storage/logs/rsync logbias latency default
storage/logs/rsync dedup sha256
inherited from storage
storage/logs/rsync mlslabel -
storage/logs/rsync sync standard default
storage/logs/rsync refcompressratio 5.93x
--
Freddie Cash
fjwcash at gmail.com
On Tue, May 8, 2012 at 10:24 AM, Freddie Cash <fjwcash at gmail.com> wrote:> I have an interesting issue with one single ZFS filesystem in a pool. > All the other filesystems are fine, and can be mounted, snapshoted, > destroyed, etc. ?But this one filesystem, if I try to do any operation > on it (zfs mount, zfs snapshot, zfs destroy, zfs set <anything>), it > spins the system until all RAM is used up (wired), and then hangs the > box. ?The zfs process sits in tx -> tx_sync_done_cv state until the > box locks up. ?CTRL+T of the process only ever shows this: > ? ?load: 0.46 ?cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 2440k > > Anyone come across anything similar? ?And found a way to fix it, or to > destroy the filesystem? ?Any suggestions on how to go about debugging > this? ?Any magical zdb commands to use? > > The filesystem only has 5 MB of data in it (log files), compressed via > LZJB for a compressratio of ~6x. ?There are no snapshots for this > filesystem. > > Dedupe is enabled on the pool and all filesystems.After more fiddling, testing, and experimenting, it all came down to not enough RAM in the box to mount the 5 MB filesystem. After installing an extra 8 GB of RAM (32 GB total), everything mounted correctly. Took 27 GB of wired kernel memory (guessing ARC space) to do it. Unmount, mount, export, import, change properties all completed successfully. And the box is running correctly with 24 GB of RAM again. We''ll be ordering more RAM for our ZFS boxes, now. :) -- Freddie Cash fjwcash at gmail.com