cvs commit: src/sys/ufs/ufs dirhash.h ufs_dirhash.c
jroberson at jroberson.net
Fri Apr 11 12:31:45 UTC 2008
On Fri, 11 Apr 2008, David Malone wrote:
> On Fri, Apr 11, 2008 at 09:48:12AM +0000, Jeff Roberson wrote:
>> - Use a lockmgr lock rather than a mtx to protect dirhash. This lock
>> may be held for the duration of the various dirhash operations which
>> avoids many complex unlock/lock/revalidate sequences.
>> - Permit shared locks on lookup. To protect the ip->i_dirhash pointer we
>> use the vnode interlock in the shared case. Callers holding the
>> exclusive vnode lock can run without fear of concurrent modification to
>> - Hold an exclusive dirhash lock when creating the dirhash structure for
>> the first time or when re-creating a dirhash structure which has been
> Hi Jeff,
> I've been reading through this patch to understand it. I've a few
> 1) You initialise a chunk of the dir hash struct earlier
> now, though it was previously left until we knew memory
> allocation was successful. Is there a reason for that?
Now instead of freeing a recycled dirhash pointer in ip->i_dirhash I lock
it and re-create it. This makes various races simpler. The dirhash has
to be minimally constructed for dirhash_free_locked to function correctly
if we bail out early.
> 2) Is ufsdirhash_create() trying harder than before to
> allocate a dirhash? It looks like it might be, but maybe
> that's just a feature of the new locking.
The conditions in ufsdirhash_build() which result in creating a new
directory hash remain the same except we are more agressive about
reclaiming when the sysctl is lowered to ease testing.
ufsdirhash_create() is a new function which tries to lock or allocate a
dirhash. This is complex because i_dirhash must be protected with the
vnode interlock if the directory lock is held shared. This code has to
deal with concurrent creation as well as concurrent destruction by
> 3) You've added a dh_memreq member to the structure to
> remember a value that's cheap to calculate from other
> structure members and only required at allocation and free
> time. I can understand why you'd want to avoid repeating
> the code, but it would seem better to make it a macro rather
> than storing it for the lifetime of the object?
Earlier versions of this patch suffered various accounting errors with the
space used by the i_dirhash structure. Since I may call
ufsdirhash_free_locked() before space is accounted for keeping the integer
was much simpler.
> 4) You replaced an unlocked read with a locked read in
> ufsdirhash_lookup. I think this is because you are now
> locking the dh_score with the list mutex and are taking
> advantage of the fact that the score is changed just below?
> Won't this mean that for a sequence of operations on one
> directory, we'll now need to lock the list mutex for each
> lookup and get the lockmanager lock on the dirhash. At face
> value, this would seems to be worse than the old situation
> of only getting the dh mutex for each opteation?
In the normal ufs_lookup() path we enter 3 dirhash functions which can now
assume the lock is held rather than locking/unlocking. And in lookup in
particular we are no longer required to drop and reacquire the lock,
potentially multiple times. I'm sure it's a net reduction in lock
operations. That really wasn't the primary motivation however. I mostly
wanted to enable shared locking of directories. I doubt the number of mtx
operations is dominating the performance of directory operations in any
> In the case of multiple directories, I guess we'll have two
> mutex locks replaced by one mutex and one lockmanager
> (shared) lock? This is probably the usual case, which isn't
> too bad.
I'm not sure I understand this.
> 5) It looks like the recycle policy (and locking thereof)
> is now slightly different and I think it might do something
> funny in some cases now. If we get the list lock in
> ufsdirhash_recycle, but someone holds an exclusive lock on
> some of the dirhashes, then we will walk over the locked
> ones until we find an unlocked one and free it. We then
> jump back to the start of the list and have to walk over
> the locked ones again. I wonder if in a low memory situation
> the locked nodes could be actually waiting for the list
> lock in ufsdirhash_list_locked() and there could be some
> sort of cascade where you end up freeing dirhashes that
> should actually be kept. Actually, this probably isn't
> possible because the list lock is dropped while we're doing
> the free?
Because we drop the list lock we have to restart the processing from the
beginning. I guess it is a risk that we could have many locked dirhashes
that we'll have to skip causing the recycle loop to potentially take a
long time. However, the old code was strange here as well. It'd only
examine the head of the list and only recycled if the score reached 0.
Really we could just make this a simple LRU and be done with it. The
score seems redundant.
More information about the cvs-src