8.0RC1, ZFS: deadlock

Tue Sep 29 08:29:30 UTC 2009

Hello,

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset on  
a remote machine. It can be done at one minute intervals. Maybe we're  
doing a somehow atypical usage of ZFS, but, well, seems to be a great  
solution to keep filesystem replicas once this is sorted out.

How to reproduce:

Set up two systems. A dataset with heavy I/O activity is replicated  
from the first to the second one. I've used a dataset containing /usr/ 
obj while I did a make buildworld.

Replicate the dataset from the first machine to the second one using  
an incremental send

zfs send -i pool/dataset at Nminus1 pool/dataset at N | ssh destination zfs  
receive -d pool

When there is read activity on the second system, reading the  
replicated system, I mean, having read access while zfs receive is  
updating it, there can be a deadlock. We have discovered this doing a  
test on a hopefully soon in production server, with 8 GB RAM. A Bacula  
backup agent was running and ZFS deadlocked.

I have set up a couple of VMWare Fussion virtual machines in order to  
test this, and it has deadlocked as well. The virtual machines have  
little memory, 512 MB, but I don't believe this is the actual problem.  
There is no complaint about lack of memory.

A running top shows processes stuck on "zfsvfs"

last pid:  2051;  load averages:  0.00,  0.07,  0.55    up 0+01:18:25   
12:05:48
37 processes:  1 running, 36 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free
Swap: 1024M Total, 1024M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU  
COMMAND
  1914 root        1  62    0 11932K  2564K zfsvfs  0   0:51  0.00%  
bsdtar
  1093 borjam      1  44    0  8304K  2464K CPU1    1   0:32  0.00% top
  1913 root        1  54    0 11932K  2600K rrl->r  0   0:19  0.00%  
bsdtar
  1019 root        1  44    0 25108K  4812K select  0   0:05  0.00% sshd
  2008 root        1  76    0 13600K  1904K tx->tx  0   0:04  0.00% zfs
  1089 borjam      1  44    0 37040K  5216K select  1   0:04  0.00% sshd
   995 root        1  76    0  8252K  2652K pause   0   0:02  0.00% csh
   840 root        1  44    0 11044K  3828K select  1   0:02  0.00%  
sendmail
  1086 root        1  76    0 37040K  5156K sbwait  1   0:01  0.00% sshd
   850 root        1  44    0  6920K  1612K nanslp  0   0:01  0.00% cron
   607 root        1  44    0  5992K  1540K select  1   0:01  0.00%  
syslogd
  1090 borjam      1  76    0  8252K  2636K pause   1   0:01  0.00% csh
   990 borjam      1  44    0 37040K  5220K select  0   0:00  0.00% sshd
   985 root        1  48    0 37040K  5160K sbwait  1   0:00  0.00% sshd
   911 root        1  44    0  8252K  2608K ttyin   0   0:00  0.00% csh
   991 borjam      1  56    0  8252K  2636K pause   0   0:00  0.00% csh
   844 smmsp       1  46    0 11044K  3852K pause   0   0:00  0.00%  
sendmail

Interestingly, this has blocked access to all the filesystems. I  
cannot, for instance, ssh into the machine anymore, even though all  
the system-important filesystems are on  ufs, I was just using ZFS for  
a test.

Any ideas on what information might be useful to collect? I have the  
vmware machine right now. I've made a couple of VMWare snapshots of  
it, first before breaking into DDB with the deadlock just started, the  
second being into DDB (I've broken into DDB with sysctl).

Also, a copy of the VMWare virtual machine with snapshots is avaiable  
on request. Your choice ;)

Borja.