kern/184677 / ZFS snapshot handling deadlocks

krichy at tvnetwork.hu krichy at tvnetwork.hu
Tue Jan 14 12:26:13 UTC 2014


Dear Andriy,

I've also sent a detailed explanation of my changes to Will Andrews, I 
could also forward that to you. You are right, these areas are critical 
areas, though the patch is very small and solves some of the issues. And 
that is also true that I am not a really FreeBSD developer, just I 
compared illumos and freebsd code, debugged these locking issues for a 
while. Then at the end I think the result patch is reasonably small, and I 
also tried to follow some freebsd recommendations.

If I can help somehow I am willing to do. Also I would accept any critics 
on my work. It is for a month or two now when we discovered these bugs, 
and unfortunately it happened on our production servers, and with these 
patches our system works since then.

But of course, that is not a proof of concept.

Thanks in advance,


Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 14 Jan 2014, Andriy Gapon wrote:

> Date: Tue, 14 Jan 2014 13:46:35 +0200
> From: Andriy Gapon <avg at FreeBSD.org>
> To: krichy at tvnetwork.hu
> Cc: freebsd-fs at FreeBSD.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> on 25/12/2013 07:22 krichy at tvnetwork.hu said the following:
>> I've made a new patch again, which fixes most of my earlier issues. Mainly, I've
>> enabled shared vnode locks for GFS vnodes - is it acceptable? And that way,
>> deadlock cases reduced much, and also the patch is more clear (at least to me).
>> One thing still remains, the spa_namespace_lock race I mentioned before, I dont
>> know how to handle that.
>>
>> Waiting for comments on this.
>
> Richard,
>
> first of all, apologies for the delay with a reply and for not having any
> comment on the essence of your patch...
>
> I do have the following meta-comment.
>
> - working with FreeBSD VFS is hard
> - porting code that was written for a very different VFS model to FreeBSD VFS is
> even harder
> - doing the same for the code that plays various tricks, like .zfs support code
> does, is harder again
> - reviewing somebody else's changes to that kind of code is harder still than
> making such changes
>
> This is quite an unfortunate situation.  I am not much surprised by the lack of
> followups to your analysis and patches.
> I am saying this as someone who spent some time analyzing and trying to fix the
> .zfs code and ZFS<->VFS interaction in general.  See e.g.
> http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/18924/focus=19056
>
> My opinion is that .zfs code breaks several fundamental FreeBSD VFS contracts.
> The most glaring violation is that VOP_INACTIVE makes a vnode invalid.
> I think that trying to fix .zfs code by patching individual problems here and
> there is an uphill battle.  I think that the same also applies to ZFS ZPL code
> but in a less obvious way.
> The code in many cases just pretends to play by VFS rules by satisfying most
> obvious assertions, but it does not really try to obey VFS contracts.  For
> example, almost all locks in znode are mostly redundant given VFS vnode locking.
> But in some cases they are not sufficient precisely because VFS expects its
> locking to be used rather than ZFS internal locking.  The most obvious example
> is interaction of VOP_RENAME with other VOPs.
>
> In any case, thanks for your work!  I am trying to find some time to review it.
>
> -- 
> Andriy Gapon
>


More information about the freebsd-fs mailing list