thr3ads.net - freebsd stable - 8.0RC1, ZFS: deadlock [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Borja Marcos

2009-Sep-29 08:29 UTC

8.0RC1, ZFS: deadlock

Hello,

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset on  
a remote machine. It can be done at one minute intervals. Maybe we're  
doing a somehow atypical usage of ZFS, but, well, seems to be a great  
solution to keep filesystem replicas once this is sorted out.


How to reproduce:

Set up two systems. A dataset with heavy I/O activity is replicated  
from the first to the second one. I've used a dataset containing /usr/ 
obj while I did a make buildworld.

Replicate the dataset from the first machine to the second one using  
an incremental send

zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination zfs  
receive -d pool

When there is read activity on the second system, reading the  
replicated system, I mean, having read access while zfs receive is  
updating it, there can be a deadlock. We have discovered this doing a  
test on a hopefully soon in production server, with 8 GB RAM. A Bacula  
backup agent was running and ZFS deadlocked.

I have set up a couple of VMWare Fussion virtual machines in order to  
test this, and it has deadlocked as well. The virtual machines have  
little memory, 512 MB, but I don't believe this is the actual problem.  
There is no complaint about lack of memory.

A running top shows processes stuck on "zfsvfs"

last pid:  2051;  load averages:  0.00,  0.07,  0.55    up 0+01:18:25   
12:05:48
37 processes:  1 running, 36 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free
Swap: 1024M Total, 1024M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU  
COMMAND
  1914 root        1  62    0 11932K  2564K zfsvfs  0   0:51  0.00%  
bsdtar
  1093 borjam      1  44    0  8304K  2464K CPU1    1   0:32  0.00% top
  1913 root        1  54    0 11932K  2600K rrl->r  0   0:19  0.00%  
bsdtar
  1019 root        1  44    0 25108K  4812K select  0   0:05  0.00% sshd
  2008 root        1  76    0 13600K  1904K tx->tx  0   0:04  0.00% zfs
  1089 borjam      1  44    0 37040K  5216K select  1   0:04  0.00% sshd
   995 root        1  76    0  8252K  2652K pause   0   0:02  0.00% csh
   840 root        1  44    0 11044K  3828K select  1   0:02  0.00%  
sendmail
  1086 root        1  76    0 37040K  5156K sbwait  1   0:01  0.00% sshd
   850 root        1  44    0  6920K  1612K nanslp  0   0:01  0.00% cron
   607 root        1  44    0  5992K  1540K select  1   0:01  0.00%  
syslogd
  1090 borjam      1  76    0  8252K  2636K pause   1   0:01  0.00% csh
   990 borjam      1  44    0 37040K  5220K select  0   0:00  0.00% sshd
   985 root        1  48    0 37040K  5160K sbwait  1   0:00  0.00% sshd
   911 root        1  44    0  8252K  2608K ttyin   0   0:00  0.00% csh
   991 borjam      1  56    0  8252K  2636K pause   0   0:00  0.00% csh
   844 smmsp       1  46    0 11044K  3852K pause   0   0:00  0.00%  
sendmail

Interestingly, this has blocked access to all the filesystems. I  
cannot, for instance, ssh into the machine anymore, even though all  
the system-important filesystems are on  ufs, I was just using ZFS for  
a test.

Any ideas on what information might be useful to collect? I have the  
vmware machine right now. I've made a couple of VMWare snapshots of  
it, first before breaking into DDB with the deadlock just started, the  
second being into DDB (I've broken into DDB with sysctl).

Also, a copy of the VMWare virtual machine with snapshots is avaiable  
on request. Your choice ;)






Borja.

Borja Marcos

2009-Sep-29 08:43 UTC

head link

8.0RC1, ZFS: deadlock

On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote:
>
> Hello,
>
> I have observed a deadlock condition when using ZFS. We are making a  
> heavy usage of zfs send/zfs receive to keep a replica of a dataset  
> on a remote machine. It can be done at one minute intervals. Maybe  
> we're doing a somehow atypical usage of ZFS, but, well, seems to be  
> a great solution to keep filesystem replicas once this is sorted out.
>
>
> How to reproduce:
>
> Set up two systems. A dataset with heavy I/O activity is replicated  
> from the first to the second one. I've used a dataset containing / 
> usr/obj while I did a make buildworld.
>
> Replicate the dataset from the first machine to the second one using  
> an incremental send
>
> zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination  
> zfs receive -d pool
>
> When there is read activity on the second system, reading the  
> replicated system, I mean, having read access while zfs receive is  
> updating it, there can be a deadlock. We have discovered this doing  
> a test on a hopefully soon in production server, with 8 GB RAM. A  
> Bacula backup agent was running and ZFS deadlocked.
Sorry, forgot to explain what was happening on the second system (the  
one receiving the incremental snapshots) for the deadlock to happen.

It was just running an endless loop, copying the contents of /usr/obj  
to another dataset, in order to keep the reading activity going on.

That's how it has deadlocked. On the original test system an rsync did  
the same trick.





Borja

Borja Marcos

2009-Sep-29 11:44 UTC

head link

8.0RC1, ZFS: deadlock

On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote:
> I have observed a deadlock condition when using ZFS. We are making a  
> heavy usage of zfs send/zfs receive to keep a replica of a dataset  
> on a remote machine. It can be done at one minute intervals. Maybe  
> we're doing a somehow atypical usage of ZFS, but, well, seems to be  
> a great solution to keep filesystem replicas once this is sorted out.
Not sure the backtraces screenshots will get through...

First one is the backtrace for the zfs command.

Second one, a tar process doing a "cf - ." on the dataset being  
replicated, sending to a pipe.

Third one, the receiving tar process, doing an "xf -" on a second  
dataset.

-------------- next part --------------

-------------- next part --------------

-------------- next part --------------

freebsd stable - Sep 2009 - 8.0RC1, ZFS: deadlock

8.0RC1, ZFS: deadlock

8.0RC1, ZFS: deadlock

8.0RC1, ZFS: deadlock