geom_ssdcache

Wed Nov 20 00:01:28 UTC 2019

Wojciech Puchar wrote this message on Tue, Nov 19, 2019 at 13:06 +0100:
> today SSD are really fast and quite cheap, but still hard drives are many 
> times cheaper.
> 
> Magnetic hard drives are OK in long reads anyway, just bad on seeks.
> 
> While now it's trendy to use ZFS i would stick to UFS anyway.
> 
> I try to keep most of data on HDDs but use SSD for small files and high 
> I/O needs.
> 
> It works but needs to much manual and semi automated work.
> 
> It would be better to just use HDD for storage and some of SSD for cache 
> and other for temporary storage only.
> 
> My idea is to make geom layer for caching one geom provider (magnetic 
> disk/partition or gmirror/graid5) using other geom provider (SSD 
> partition).

Other thing you should decide is if the cache will be shared or per
geom provider.  And how this would interact w/ multiple separate
geom caches...  Likely w/ a shared cache (single ssd covering multiple
providers), starting clear each time would be best.

> I have no experience in writing geom layer drivers but i think geom_cache 
> would be my fine starting point. At first i would do read/write through 
> caching. Writeback caching would be next - if at all, doesn't seem good 
> idea except you are sure SSD won't fail.

Re: ssd failing, you can put a gmirror under the cache to address
this...

> But my question is really on UFS. I would like to know in geom layer if 
> read/write operation is inode/directory/superblock write or regular data 
> write - so i would give the first time higher priority. Regular data would 
> not be cached at all, or only when read size will be less than defined 
> value.

At the geom layer, I don't think that this information is available.

> Is it possible to modify UFS code to pass somehow a flag/value when 
> issuing read/write request to device layer?

Take a look at sys/ufs/ffs/ffs_vfsops.c, and it looks like at least the
writes are already segmented by superblock (see ffs_use_bwrite), but
you'd further need to split them appart.  Also, with snap shots, things
might be a little bit more difficult for them.

Most of the metadata is likely to be able to be cached in ram already,
unless you have a large, LARGE UFS fs, then why aren't you using ZFS?

I'd also suggest you look at profiling the actual read/writes to make
sure you'd be able to get the performance you need...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."