md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE]

Fri Jun 30 18:31:21 UTC 2006

Kostik Belousov wrote:
> First, I set the followup to the right mailing list.
>
> Second, I am really curious what you do. My understanding follows: you
> have set up vnode-backed md device (md0a) on sparce file, created ufs2
> on it, mounted it with quotas, and run background fsck on that fs. At
> the same time, you did rm for the snapshot file created by fsck. Right ?
>   

This is the procedure i followed, while i have quota enabled, it was not 
set on the test filesystem.

1) dd if=/dev/zero of=/usr/bigfile bs=1024 seek=209715200 count=0
2) mdconfig -a -t vnode -f /usr/bigfile
3) bsdlabel -w md0 auto
4) newfs -U md0a
5) fsck -v /dev/md0a # ^C this after a second or so, this makes the FS dirty
6) mount /dev/md0a /mnt
7) fsck -v -B /dev/md0a

in another window:
8) while true; do ls -al /mnt/.snap;sleep 1;done

> Anyway, the problem seems to be not related to neither snapshots nor
> quotas. In your trace, process 35 (syncer) tries to sync the vnode
> 0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That
> vnode is already locked by process 515 (md0 kthread). Process 515 is
> stuck in the wdrain state, waiting for buffers to be flushed. It seems
> that there is huge amount of dirty buffers going to be written to md0,
> caused by snapshotting the fs. As result, system deadlocks due to md0
> hung waiting for buffer' runspace, that is occupied by pending write
> requests to md0.
>
> Do -fs@ readers agree with analysis ?
>
> I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file-
> backed md threads to prevent such deadlocks. That i/o is already
> accounted for in the upper layer. Moreover, that already accounted
> requests do not really differ from requests (re)issued by md.
>
> Please, comment.
>   

FYI, -CURRENT passes this test without locking up, so the fix is already 
there somewhere.