disk devices speed is ugly

Scott Long scottl at samsco.org
Wed Feb 15 04:50:15 UTC 2012


On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote:

> On 2012-Feb-13 08:28:21 -0500, Gary Palmer <gpalmer at freebsd.org> wrote:
>> The filesystem is the *BEST* place to do caching.  It knows what metadata
>> is most effective to cache and what other data (e.g. file contents) doesn't
>> need to be cached.
> 
> Agreed.
> 
>> Any attempt to do this in layers between the FS and
>> the disk won't achieve the same gains as a properly written filesystem. 
> 
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
> 
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
> 

Any filesystem that uses bread/bwrite/cluster_read are already using the "generic caching subsystem" that you propose.  This includes UDF, CD9660, MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage filesystem in the tree except for ZFS.  Not all of them implement VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode pager, not requirements for using buffer-cache services on block devices.  As Kostik pointed out in a parallel email, the only thing that was removed from FreeBSD was the userland interface to cached devices via /dev nodes.  This has nothing to do with filesystems, though I suppose that could maybe sorta kinda be an issue for FUSE?.

ZFS isn't in this list because it implements its own private buffer/cache (the ARC) that understands the special requirements of ZFS.  There are good and bad aspects to this, noted below.

> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
> 

I'm not sure what you mean here.  The ARC is limited by available wired memory; attempts to allocate such memory will evict pages from the buffer cache as necessary, until all available RAM is consumed.  If anything, ZFS starves the rest of the system, not the other way around, and that's simply because the ARC isn't integrated with the normal VM.  Such integration is extremely hard and has nothing to do with having a generic caching subsystem.

Scott



More information about the freebsd-stable mailing list