unix domain sockets on nullfs(5)

Mikolaj Golub trociny at freebsd.org
Mon Jan 9 17:06:02 UTC 2012


Hi,

There is a longstanding problem with nullfs(5) that is unix sockets do
not work between lower and upper layers.

See, e.g. kern/51583, kern/159663.

On a unix socket binding the created socket is referenced in the vnode
field v_socket. This field is used on connect (from the vnode returned
by lookup). Unix socket functions like unp_bind/connect set/access
this field directly.

This is the issue for nullfs, which uses two-layer vnode approach:
binding to the upper layer, the socket reference is stored in the
upper vnode; binding to the lower fs, the socket reference is stored
in the lower vnode and is not seen from the upper layer.

E.g. having /mnt/upper nullfs mounted on /mnt/lower:

1) if we bind to /mnt/lower/test.sock we can connect only to
/mnt/lower/test.sock.

2) if we bind to /mnt/upper/test.sock we can connect only to
/mnt/upper/test.sock.

The desired behavior is one can connect to both the lower and the
upper paths regardless if we bind to /mnt/lower/test.sock or
/mnt/upeer/test.sock.

In kern/159663 two approaches were discussed:

1) copy the socket pointer from lower vnode to upper vnode on the
upper vnode get  (fix the case when one binds to the lower fs and wants
to connect via the upper, but does not fix the case when one binds to
the upper and wants to connect via the lower fs);

2) make null_lookup/create return lower vnode for VSOCK vnodes.

Both approaches have issues and looks rather hackish.

kib@ suggested that the issue could be fixed if one added new VOP_*
operations for setting and accessing vnode's v_socket field.

The attached patch implements this. It also can be found here:

http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch

It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and
VOP_UNPDETACH. Their purpose can be understood from the modifications
in uipc_usrreq.c:

-	vp->v_socket = unp->unp_socket;
+	VOP_UNPBIND(vp, unp->unp_socket);

-	so2 = vp->v_socket;
+	VOP_UNPCONNECT(vp, &so2);

-	unp->unp_vnode->v_socket = NULL;
+	VOP_UNPDETACH(unp->unp_vnode);

The default functions just do these simple operations, while
filesystems like nullfs can do more complicated things.

The patch also implements functions for nullfs. By default the old
behavior is preserved. To get the new behaviour the filesystem should
be (re)mounted with sobypass option. Then the socket operations are
bypassed to a lower vnode, which makes the socket be accessible from
both layers.

I am very interested to hear other people opinion on this.

-- 
Mikolaj Golub

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nullfs.VOP_UNP.4.patch
Type: text/x-patch
Size: 10615 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120109/9dd9e097/nullfs.VOP_UNP.4.bin


More information about the freebsd-arch mailing list