Arla on FreeBSD

Kostik Belousov kostikbel at gmail.com
Thu Feb 15 13:46:26 UTC 2007


On Thu, Feb 15, 2007 at 02:07:19PM +0100, Tomas Olsson wrote:
> Kostik Belousov <kostikbel at gmail.com> writes:
> > On Thu, Feb 15, 2007 at 12:59:04PM +0100, Tomas Olsson wrote:
> > > Kostik Belousov <kostikbel at gmail.com> writes:
> > > > I made really quick look at the places you mentioned. I have some
> > > > comment for open_file(). For FreeBSD >= 6.x, the right way to open vnode
> > > > from the kernel code is to use vn_open() (and then vn_close()) API.
> > > >
> > > Great! Sounds reasonable.
> > > 
> > > We currently open the cache files from nnpfs' VOPs, are there any risks
> > > (deadlock?) involved if one passes an absolute path to vn_open() in such a
> > > context?  I'd have liked to do use arlad's thread for this, but vput()
> > There, you already have nnpfs vnode locked. The right lock order for vnodes
> > is from root down by the tree. As such, you may end up with reversals, that
> > would result in deadlocks, IMHO.
> >
> Ok, vn_open() must be passed curthread so we can't use arlad's thread when
> in our VOP.  And we cannot use an absolute path to the cache. So vn_open()
> can't be used?  
vn_open() does not need curthread in strong sence, but td is the thread
that all locks and sleeps will be performed for. What I said does not
exclude neither usage of vn_open() nor pathes, but right locking order
shall be ensured to prevent deadlocks.

I think that your current strategy would work, but it needs to be checked.
> 
> > > explicitly uses curthread deep down in namei. Also, users are not normally
> > > allowed to access the cache files directly so some OSes complain on such a
> > > lookup with user creds; would that be a problem here?
> > How is the user access to cache is disabled ? And what is the cache itself ?
> > Local filesystem (UFS) that stores your blocks in regular files ?
> >
> It's just a local dir tree, on UFS or whatever. Currently each node gets
> it's own subdir, and each "block" (128kB perhaps) a plain file in that dir.
> The cache is supposed to be readable only by arlad (usually root) and
> through nnpfs; that's good when handling fancy ACLs etc, so the cache root
> is chmod:ed to 0700 for root:wheel.
> 
> > > Of course, we wouldn't have to worry about such things if we just kept the
> > > vnode handy for each cache block file. Maybe it's a price worth paying.
> > 
> > Then, you need to take some care of cached vnode lifecircle (e.g., even
> > keeping the vnode vref'ed would not prevent it from being recycled, so you
> > may end with dead vnode).
> >
> Eep. Tricky.
> 
> So where does this leave us, is plain lookup() (or VOP_LOOKUP) and
> VOP_CREATE on the cache a possible way to go?  Seems to work on OpenBSD.
vn_open() is the right way. Otherwise, you would in fact reimplement the
code from it.

Also, you could look at file handle API, that would save you of path lookups
after the vnode is looked up first time (look around for vfs_vptofh and
vfs_fhtovp ops). This API is used by NFS server, so it shall work :).

Instead of caching vnode, you would save file handle for future accesses.
> 
> > Also, as Robert pointed out in his email, you probably need to decide about
> > MP-safeness of nnpfs.
> >
> Well, it's marked "doesn't need giant"; we use our own global lock. I don't
> trust it 100%, but it seems to work so far.  I haven't tried it on any MP
> FreeBSD boxes though.

Again, this could lead to lock order reversals with vnode lock. For
instance, when lookup() traverses tree, it has the parent directory
locked and calls fs-provided VOP_LOOKUP(). This method shall return the
leaf vnode locked. Since, according to you claim, some alra-specific
global lock is taken between these two vnode locks, it could LOR between
vnode locks and alra lock (imagine ".." lookup in one thread, and direct
lookup in another).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20070215/8911032d/attachment.pgp


More information about the freebsd-fs mailing list