HEADS-UP new statfs structure condidered harmful

Matthew Dillon dillon at apollo.backplane.com
Mon Nov 17 11:54:08 PST 2003

:Well, there's some glue there now, but its pretty slim.   What you
:advocate would swap system call numbers for doing structure reloading per
:call, which would significantly incrase the cost of the call.
:Considering that *BSD system call overhead is pretty bad as is, I don't
:think I'd be putting structure recopies into the critical path of a
:Doug White                    |  FreeBSD: The Power to Serve

     Umm, no.  I'm not sure why you are taking such a negative view of
     things, the actual implementation is whole lot simpler then you seem to

     What we will be doing is adding new system calls to replace *stat() and
     *statfs().   They will for obvious reasons not be named the same, nor
     would the old system calls be removed.

     The new system calls will generate a capability list into a buffer 
     supplied by userland, which is really no different from the copyout that
     the old system calls already do.

     The only difference is that the userland libc function that takes over
     the *stat() and *statfs() functionality using the new system calls
     (obsoleting the original system calls) will have to have to loop
     through the capability list and populate the user-supplied statfs or
     stat structure from it.  Since the returned capability list is simply
     a stack based buffer there won't be any cache contention and the data
     will already be in the L1 cache.  My guess is that it would add
     perhaps 150ns to these system calls compared to the 3-5uS they already
     take for the non-I/O case.

     The capability list would be 'chunky'.  e.g. one capability record
     would represent all three timespecs for example, another record
     would represent uid, and gid.  Another record record represent file
     size and block count, and so forth.

     They key point is that the individual capability elements would not
     change, ever.  Instead if a change is needed a new capability element
     would be added and an argument to the new syscalls will let the system
     know whether it needs to generate the older elements that the newer ones
     replace.  Userland will ignore capabilities it does not understand.

     The result is full forwards and backwards compatibility, forever.

     I do not believe there is any performance impact at all, especially
     if stat has to go do I/O.  If you care about performance then I
     recommend that you fix the syscall path in 5.x instead of worrying
     yourself over stat().  If a particular program really needs to save the
     150ns, say 'find', then it can call the new system call directly.  But
     I really doubt anyone would notice 'find' running any slower.
     I certainly care a great deal about performance in DragonFly and I am
     not worried about the capability idea's impact *AT* *ALL*.

     The userland implementation would be something like this:

     stat(const char *file, struct stat *st)
	char tmpbuf[SMALLBUF];	/* stat info is expected to fit */
	char *buf = tmpbuf;
	int off;
	int len;
	struct stat_cap_header *cap;

	 * Run the system call.  Try a small buffer first (designed to 
	 * succeed for the current version of the OS).  If it fails then
	 * allocate a larger buffer (compatibility with future OSs that might
	 * provide more information).
	if ((len = stat_cap(file, buf, STAT_CAP_STDFIELDS)) < 0) {
	    if (errno != E2BIG)
	    buf = malloc(((struct stat_cap_header *)buf)->c_len);
	    if ((len = stat_cap(file, buf, STAT_CAP_STDFIELDS)) < 0) {

	 * Populate the stat structure (this could be common code for all
	 * stat*() calls).
	off = 0;
	while (off < len) {
	    cap = (struct stat_cap_header *)(buf + off);
	    switch(cap->c_type) {
	    case STAT_TIMESPEC1:
		st->st_atimespec = cap->c_timespec1.atimespec;
		st->st_mtimespec = cap->c_timespec1.mtimespec;
		st->st_ctimespec = cap->c_timespec1.ctimespec;
	    case STAT_UIDGID1:
		st->st_uid = cap->c_uidgid1.uid;
		st->st_gid = cap->c_uidgid1.gid;
	    off += cap->c_len;
	if (buf != tmpbuf)


More information about the freebsd-current mailing list