file locking.

Jeff Roberson jroberson at
Thu Aug 16 22:28:24 UTC 2007

On Thu, 16 Aug 2007, John Baldwin wrote:

> On Thursday 16 August 2007 04:18:51 pm Jeff Roberson wrote:
>> On Thu, 16 Aug 2007, John Baldwin wrote:
>>> On Thursday 16 August 2007 03:01:18 am Jeff Roberson wrote:
>>>> I have been looking at file locking for 8.0 and have come up with a way
> to
>>>> completely eliminate the file lock, reduce the size of struct file by
>>>> ~4 pointers (22%), remove the global list of files and associated lock,
>>>> and restrict the scope of unp_gc while removing several race conditions.
>> Thanks for the review.
>>> I like finit().  I would maybe change socketpair() to pass so1 and so2 to
>>> finit() rather than setting f_data twice.
>> I'm not sure what you mean?
> In socketpair() the new code does this:
> 	fp1->f_data = so1;
> 	...
> 	fp2->f_data = so2;
> 	...
> 	finit(fp1, ..., fp1->f_data, ...);
> 	finit(fp2, ..., fp2->f_data, ...);
> It might be cleaner to do this:
> 	...
> 	...
> 	finit(fp1, ..., so1, ...);
> 	finit(fp2, ..., so2, ...);

I did not want to have one file pointing at another without an initialized 
f_data field.  However, I guess the underlying sockets are already setup 
so this may not be important.  The code did go to some effort to setup 
f_data early before as well so I didn't want to change that.

>>> Did you consider using refcount_* for f_count rather than using the atomic
>>> operations directly?
>> Yes, I may change it.  I was investigating some schemes where it may not
>> have been sufficient but I think it will work fine for what I've settled
>> on.
>> I also think I'll wrap some atomics for manipulating f_type in macros.
> That would be good.
>>> It appears you never call unp_discard() and thus 'closef()' on the sockets
> in
>>> unp_gc() now.  Perhaps the fdrop()'s in the end of the loop should be
>>> unp_discard()'s instead?
>> unp_discard() happens as a side-effect of sorflush().
> So, in the old code there's a really big comment about how it makes sure to
> only do closef() (via unp_discard()) once but does a sorflush() for each
> f_msgcount.  Was that comment no longer true?

The comment actually says:

          * It is incorrect to simply unp_discard each entry for f_msgcount
          * times

What we do is grab an extra ref to each struct file that is dead and then 
explicitly sorflush() them.  This closes all of the references held by 
that socket, which would free any unreferenced non-unp descriptors. 
However, we want to prevent the algorithm from recursing in on itself so 
we hold the extra file ref for unp sockets that would be closed.  Then 
when we loop releasing this one last ref at the end the actually fo_close 
will be called.

This portion of the algorithm is not significantly different from before. 
I just introduced an extra flag so I could remove the race from dropping 
the lock inbetween operations and get an accurate count of how big the 
array needs to be.

>>> Also, it's a bit of a shame to lose 'show files' from ddb.
>> Yes, I can re-implement that using the same technique as sysctl kern.file.
>> What's more troubling is the continued erosion of support for libkvm as it
>> uses filelist.  I don't think libkvm is a strong enough case to keep
>> filelist around.  I guess I will have to hack it to work similarly to
>> sysctl as well.
>> Do we have an official stance on libkvm?  Now that we have sysctl for
>> run-time it's only useful for crashdump debugging.  Really in most cases
>> it could be replaced with a reasonable set of gcc scripts.
> s/gcc/gdb/.  At work we do mostly post-mortem analysis, so having working
> libkvm is still very important for us.  xref the way I just fixed netstat to
> work again on coredumps recently.  Breaking fstat on coredumps would probably
> be very annoying.  libkvm can always use the same algo as the sysctl if
> necessary though.

Yes, I'll do that.

> How much overhead is the filelist if it is only used in file
> creation/destruction?

Well shaving off two pointers gets us into cacheline size for struct file 
which has some perf improvement.  Furthermore, having a global lock in 
open/close will impact even single threaded workloads on larger machines. 
Truthfully I haven't seen contention on this yet but I suspect it's just 
because I haven't tried the right workload. :-)  If we can do it without 
hindering other areas, which we can, I think it's a good idea.  I included 
it because it was easy to do so after the gc changed.

fwiw this is about a 2% improvement in peak throughput on my mysql test on 
the 8way with better scaling at very high numbers of threads.  At peak 
throughput there was no measured contention on this lock.  That 
improvement is purely from fewer atomics and smaller struct file.

Kris created a synthetic benchmark that had over 50% improvement in 
throughput with lots of threads contending on the same descriptor. 
single-threaded performance is probably much better with this as well, 
although less dramaticaly so.

> -- 
> John Baldwin

More information about the freebsd-arch mailing list