Phantom Jails

Mon Nov 20 18:12:07 UTC 2006

On Fri, 17 Nov 2006, Dirk Engling wrote:

> Rumors went around and tales were told about jails magically booing around 
> in prison list, even after they deceased.
>
> Most people consider this a rather aesthetical issue, however if you run 
> your jails from directories that need to be unmounted (e.g. from md-images, 
> on external drives, from gbde or geli images) those phantom jails become 
> rather annoying, since you cannot umount their roots.
>
> Investigations have shown, that
>
> 1) sockets hold a lock on (increase reference counter in) the ucred 
> structure of the calling process 2) This ucred structure in turn keeps a 
> lock on (increases reference counter in) the prison struct representing the 
> jail this process belongs to 3) The prison struct in turn keeps a handle to 
> jails root directory.
>
> If a process holding a tcp connection is killed, the connection is being 
> inherited by the kernel. It waits there for tcp tear down or tcp time out to 
> occur. Only then socket's lock on ucred is released, which releases ucreds 
> lock on prison struct (thus terminating phantom jails) which may, if it is 
> the last ucred referencing the prison, release the prison and its handle to 
> the root directory (solving my un-umount-able images).
>
> There were kinds of phantom jails being sighted, that did not vanish after 
> tcp timeout, that might be deadlocked by open files or mmaped regions. 
> However the above case happens regularly with my mail server jail that holds 
> hundreds of imap-connections, one disconnected dsl-user can prevent tcp tear 
> down to happen successfully thus forcing me to force umount the mail server.
>
> My suggestion would be (I will provide a patch, if discussion produces no 
> major disagreement) to release ucred structs held by sockets as soon as the 
> process dies. They are being used for accounting purposes only, anyway. The 
> same may apply to the other types of phantom jails, as well. I could not 
> create those deliberately and therefore can not exactly spot the proper 
> location to fix.

The credentials on sockets (and PCBs) are used for a number of things, not 
least:

- Visibility checks to determine which processes should be able to monitor
   them (netstat, sockstat, etc.)

- For firewall uid/gid rule evaluation, such as with ipfw and pf.

There may also be other cases that momentarily elude me.

I think these mean we should not let the jail go away while the credential is 
in use as the jail information hung off the credential is required for access 
control purposes.  However, we could think about discarding the vnode 
reference under some circumstances, leaving the jail without a vnode.  This 
would mean that processes could no longer "join" the jail, but that it could 
still be used for process/socket/etc access control.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> Comments?
>
>  erdgeist
>
> P.S.: if you want to reproduce a phantom jail try the following:
> 1) create and start a jail
> 2) Start a ssh/web/whatever server within the jail
> 3) Connect to that server from the host system.
> 4) Keep this connection open while you kill the jail
> 5) Do a 'jls' and compare its output to "ps axuu | grep J"
> 6) Kill the process that connected to the service.
> 7) Do a 'jls' again.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (Darwin)
>
> iD8DBQFFXSp5ImmQdUyYEgkRAtOAAJ4iSzyu2LOf+RBNArvYAk1Tv8cssACfRxJa
> 12OGEwWugcIDhlGGTHJrz0o=
> =gXK8
> -----END PGP SIGNATURE-----
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>