capability races (was: Re: Seeing ENOTCAPABLE from write FDs in kqueue?)
    Mateusz Guzik 
    mjguzik at gmail.com
       
    Fri Jun 20 01:04:31 UTC 2014
    
    
  
Good news.
While previous approached were giving ~16% worse performance in my
microbenchmarks I found a way to make up for this (needs verification
for correctness).
fget_unlocked does:
        /*
         * Avoid reads reordering and then a first access to the
         * fdp->fd_ofiles table which could result in OOB operation.
         */
        if (fd < 0 || fd >= atomic_load_acq_int(&fdp->fd_nfiles))
                return (EBADF);
Turns out replacing this read with mere "fdp->fd_nfiles" gives ~10%
performance increase over regular kernel and almost brings kernel with
sequence counters to the level of regular kernel.
So how to get rid of the need for atomic_load_acq_int?
Currently struct filedesc contains file table and its size. File table
pointer update is one operation, size update is another. Thus the need
for memory barriers.
Now let's consider the following structure:
struct fdtable {
	int f_size;
	struct filedesc f_files[0];
};
Pointer to this strucure in struct filedesc can be replaced atomically,
and from what I'm told the only synchronization needed is data dependency
barrier (which in case of amd64 means there is no need to do anything).
In short, we can get ~10% better performance and then sacrifice the gain
(and some more) to have capability races fixed and be slightly slower
than now.
Thoughts?
The patch is a total hack so I'm not attaching it, but as far as
performance goes you can just move fd_nfiles to be next to fd_ofiles and
remove atomic_load_acq_int in fget_unlocked.
fdgrowtable will need to have a write barrier before the pointer is
atomically replaced.
-- 
Mateusz Guzik <mjguzik gmail.com>
    
    
More information about the freebsd-arch
mailing list