ZFS regimen: scrub, scrub, scrub and scrub again.

Thu Jan 24 14:45:55 UTC 2013

Wow!.!  OK.  It sounds like you (or someone like you) can answer some of my
burning questions about ZFS.

On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki <nowakpl at platinum.linux.pl>wrote:

> Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).
>
> A worst case scenario could happen if your random i/o workload was reading
> random files each of 2048 bytes. Each file read would require data from 4
> disks (5th is parity and won't be read unless there are errors). However if
> files were 512 bytes or less then only one disk would be used. 1024 bytes -
> two disks, etc.
>
> So ZFS is probably not the best choice to store millions of small files if
> random access to whole files is the primary concern.
>
> But lets look at a different scenario - a PostgreSQL database. Here table
> data is split and stored in 1GB files. ZFS splits the file into 128KiB
> records (recordsize property). This record is then again split into 4
> columns each 32768 bytes. 5th column is generated containing parity. Each
> column is then stored on a different disk. You could think of it as a
> regular RAID-5 with stripe size of 32768 bytes.
>

Ok... so my question then would be... what of the small files.  If I write
several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?

I'm considering the difference, say, between cyrus imap (one file per
message ZFS, database files on different ZFS filesystem) and dbmail imap
(postgresql on ZFS).

... now I realize that PostgreSQL on ZFS has some special issues (but I
don't have a choice here between ZFS and non-ZFS ... ZFS has already been
chosen), but I'm also figuring that PostgreSQL on ZFS has some waste
compared to cyrus IMAP on ZFS.

So far in my research, Cyrus makes some compelling arguments that the
common use case of most IMAP database files is full scan --- for which it's
database files are optimized and SQL-based files are not.  I agree that
some operations can be more efficient in a good SQL database, but full scan
(as a most often used query) is not.

Cyrus also makes sense to me as a collection of small files ... for which I
expect ZFS to excel... including the ability to snapshot with impunity...
but I am terribly curious how the files are handled in transactions.

I'm actually (right now) running some filesize statistics (and I'll get
back to the list, if asked), but I'd like to know how ZFS is going to store
the arriving mail... :).