Namecache status report #4

Gleb Kurtsou gleb.kurtsou at gmail.com
Tue Jun 22 09:20:34 UTC 2010


Dircache has undergone considerable change last week.

First of all after trying to make pefs dircache-friendly I've decided to
drop dircache_*update() API. Its original purpose was to populate cache
during VOP_LOOKUP and/or VOP_READDIR, so that upper layer (stacked
filesystem) could check cache instead of calling lower layer. Keeping
directory cache 'complete' was too hard and race-prone, besides I didn't
find a good way to use it in pefs. Dropping update() helped with getting
rid of sleeping in dircache and notion of 'partial/complete' directory
cache which further simplified things.

I've also removed directory offsets and filesystem private data from
cache entries, as they are now useless. To facilitate use within stacked
filesystems API for getting name (one of the names if there are several)
and generation number for a vnode was added. Cache generation number
reflects changes in directory (add, remove, rename) and similar to
va_seq in OpenSolaris (va_gen in our ZFS port).

Dircache is now granularly locked. I'm working on improving reference
counting and adding queue for unused entries. Caching algorithm is to be
used to control queue size and to decide which entries to free. 2Q
algorithm looks very promising.

Races in VOP_RENAME() still remain the biggest obstacle. Most
filesystems unlock vnodes and do relookup() later during rename() (e.g.
ext2). Others exclusively lock all vnodes (UFS) or perform 'name
locking' (ZFS). dircache_rename() expects nodes with given names don't
change during call. Both UFS and ZFS are fine, but even tmpfs has
issues.

Thanks,
Gleb.


More information about the soc-status mailing list