Hello,
I have observed a deadlock condition when using ZFS. We are making a
heavy usage of zfs send/zfs receive to keep a replica of a dataset on
a remote machine. It can be done at one minute intervals. Maybe we're
doing a somehow atypical usage of ZFS, but, well, seems to be a great
solution to keep filesystem replicas once this is sorted out.
How to reproduce:
Set up two systems. A dataset with heavy I/O activity is replicated
from the first to the second one. I've used a dataset containing /usr/
obj while I did a make buildworld.
Replicate the dataset from the first machine to the second one using
an incremental send
zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination zfs
receive -d pool
When there is read activity on the second system, reading the
replicated system, I mean, having read access while zfs receive is
updating it, there can be a deadlock. We have discovered this doing a
test on a hopefully soon in production server, with 8 GB RAM. A Bacula
backup agent was running and ZFS deadlocked.
I have set up a couple of VMWare Fussion virtual machines in order to
test this, and it has deadlocked as well. The virtual machines have
little memory, 512 MB, but I don't believe this is the actual problem.
There is no complaint about lack of memory.
A running top shows processes stuck on "zfsvfs"
last pid: 2051; load averages: 0.00, 0.07, 0.55 up 0+01:18:25
12:05:48
37 processes: 1 running, 36 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free
Swap: 1024M Total, 1024M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
1914 root 1 62 0 11932K 2564K zfsvfs 0 0:51 0.00%
bsdtar
1093 borjam 1 44 0 8304K 2464K CPU1 1 0:32 0.00% top
1913 root 1 54 0 11932K 2600K rrl->r 0 0:19 0.00%
bsdtar
1019 root 1 44 0 25108K 4812K select 0 0:05 0.00% sshd
2008 root 1 76 0 13600K 1904K tx->tx 0 0:04 0.00% zfs
1089 borjam 1 44 0 37040K 5216K select 1 0:04 0.00% sshd
995 root 1 76 0 8252K 2652K pause 0 0:02 0.00% csh
840 root 1 44 0 11044K 3828K select 1 0:02 0.00%
sendmail
1086 root 1 76 0 37040K 5156K sbwait 1 0:01 0.00% sshd
850 root 1 44 0 6920K 1612K nanslp 0 0:01 0.00% cron
607 root 1 44 0 5992K 1540K select 1 0:01 0.00%
syslogd
1090 borjam 1 76 0 8252K 2636K pause 1 0:01 0.00% csh
990 borjam 1 44 0 37040K 5220K select 0 0:00 0.00% sshd
985 root 1 48 0 37040K 5160K sbwait 1 0:00 0.00% sshd
911 root 1 44 0 8252K 2608K ttyin 0 0:00 0.00% csh
991 borjam 1 56 0 8252K 2636K pause 0 0:00 0.00% csh
844 smmsp 1 46 0 11044K 3852K pause 0 0:00 0.00%
sendmail
Interestingly, this has blocked access to all the filesystems. I
cannot, for instance, ssh into the machine anymore, even though all
the system-important filesystems are on ufs, I was just using ZFS for
a test.
Any ideas on what information might be useful to collect? I have the
vmware machine right now. I've made a couple of VMWare snapshots of
it, first before breaking into DDB with the deadlock just started, the
second being into DDB (I've broken into DDB with sysctl).
Also, a copy of the VMWare virtual machine with snapshots is avaiable
on request. Your choice ;)
Borja.