unix domain sockets on nullfs(5)
John Baldwin
jhb at freebsd.org
Tue Jan 10 13:19:21 UTC 2012
On Monday, January 09, 2012 11:37:52 am Mikolaj Golub wrote:
> Hi,
>
> There is a longstanding problem with nullfs(5) that is unix sockets do
> not work between lower and upper layers.
>
> See, e.g. kern/51583, kern/159663.
>
> On a unix socket binding the created socket is referenced in the vnode
> field v_socket. This field is used on connect (from the vnode returned
> by lookup). Unix socket functions like unp_bind/connect set/access
> this field directly.
>
> This is the issue for nullfs, which uses two-layer vnode approach:
> binding to the upper layer, the socket reference is stored in the
> upper vnode; binding to the lower fs, the socket reference is stored
> in the lower vnode and is not seen from the upper layer.
>
> E.g. having /mnt/upper nullfs mounted on /mnt/lower:
>
> 1) if we bind to /mnt/lower/test.sock we can connect only to
> /mnt/lower/test.sock.
>
> 2) if we bind to /mnt/upper/test.sock we can connect only to
> /mnt/upper/test.sock.
>
> The desired behavior is one can connect to both the lower and the
> upper paths regardless if we bind to /mnt/lower/test.sock or
> /mnt/upeer/test.sock.
>
> In kern/159663 two approaches were discussed:
>
> 1) copy the socket pointer from lower vnode to upper vnode on the
> upper vnode get (fix the case when one binds to the lower fs and wants
> to connect via the upper, but does not fix the case when one binds to
> the upper and wants to connect via the lower fs);
>
> 2) make null_lookup/create return lower vnode for VSOCK vnodes.
>
> Both approaches have issues and looks rather hackish.
>
> kib@ suggested that the issue could be fixed if one added new VOP_*
> operations for setting and accessing vnode's v_socket field.
>
> The attached patch implements this. It also can be found here:
>
> http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch
>
> It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and
> VOP_UNPDETACH. Their purpose can be understood from the modifications
> in uipc_usrreq.c:
>
> - vp->v_socket = unp->unp_socket;
> + VOP_UNPBIND(vp, unp->unp_socket);
>
> - so2 = vp->v_socket;
> + VOP_UNPCONNECT(vp, &so2);
>
> - unp->unp_vnode->v_socket = NULL;
> + VOP_UNPDETACH(unp->unp_vnode);
>
> The default functions just do these simple operations, while
> filesystems like nullfs can do more complicated things.
>
> The patch also implements functions for nullfs. By default the old
> behavior is preserved. To get the new behaviour the filesystem should
> be (re)mounted with sobypass option. Then the socket operations are
> bypassed to a lower vnode, which makes the socket be accessible from
> both layers.
>
> I am very interested to hear other people opinion on this.
I think this is a decent solution. Why not make the locking notes for
VOP_UNPCONNECT() be "L" instead of "E"? A read lock should be sufficient
to fetch the socket? In fact, I suspect that unp_connect() could actually
use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed
to namei() via NDINIT().
--
John Baldwin
More information about the freebsd-arch
mailing list