performance of jailed processes

Tue Mar 30 10:47:56 PST 2004

On Tue, 30 Mar 2004, Dag-Erling Smørgrav wrote:

> Can anyone explain why jailed processes seem to perform much worse than
> non-jailed processes in recent -CURRENT? 
> 
> Specifically, running a query against a remote MySQL server from inside
> a jail takes an order of magnitude more time than from outside the jail. 
> Tcpdump shows that the TCP packets carrying the result are evenly
> spaced, so this is not a matter of the server timing out on a DNS lookup
> or anything like that. 
> 
> Running a configure script also takes much longer inside the jail than
> outisde, and again, progress is even (though slow), so it is clearly not
> a matter of DNS timing out. 
> 
> There is no NFS or NIS in the equation either.  Parts of the file space
> inside the jail is a nullfs mount, but we've also tried without nullfs. 
> 
> The system currently uses SCHED_ULE, but we had similar trouble with
> SCHED_4BSD on 5.1-RELEASE before we went -CURRENT. 
> 
> The machine currently has ~2600 processes running in ~400 jails.  Is it
> conceivable that be scalability issues, perhaps in the credentials code,
> could cause vastly increased syscall overhead for jailed processes? 

This is bizarre, although it sounds like you've covered a lot of the
interesting potential causes already.  The short answer is, no: the
overhead of jail on all system call paths is very, very low, especially
given that the most critical fields in the prison structure are immutable
(IP address), and so shouldn't impact performance much at all.  And, in
fact, since we're using thread-local credential references, the entire
thing should be quite cheap once the jail is created.

If I had to guess at causes, some of which you've looked at, I'd chase the
following:

- DNS -- I know you mentioned it, but I'd check anyway.  Especially if
  resolv.conf has bad DNS servers in it in the jails, etc.  You might try
  writing a trivial gethostbyname() test app and timing it in and out of
  the jail.  Also look at the reverse lookup done by the MySQL server.
  The impact of the source IP address might be particularly interesting.

- Compare the impact of jail() vs chroot() for configure performance.  The
  jail code barely if at all impacts vnode operations and lookup relative
  to chroot(), but the impact of chroot() might well be interesting to
  look at. 

- Nullfs should slow down file system I/O some, especially lookups, and
  the more nullfs mounts there are, the more impact it should have (by
  dramatically increasing the number of vnodes in use).  However, if
  you've already done the side-by-side comparison...  Again, chroot() vs
  jail() would be interesting.

- It would be interesting to know if applications outside the jail bound
  to various IP addresses see performance differences depending on the IP
  used.  We have hashed IP address lookup, but there are some operations
  in the stack that require walking the list of addresses, etc.  If the
  non-jailed software always uses the first address because they're all in
  the same subnet, that might conceivably make a difference.  Taking jail
  out of the picture in some basic micro-benchmarks might help here also. 

Can you identify any micro-benchmarks rather than macro-benchmarks that
reflect a significant difference?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org      Senior Research Scientist, McAfee Research