ZFS on 10-STABLE r281159: programs, accessing ZFS pauses for minutes in state [*kmem arena]
Steven Hartland
killing at multiplay.co.uk
Thu Jul 30 15:41:19 UTC 2015
On 30/07/2015 15:41, Paul Kraus wrote:
> On Jul 30, 2015, at 7:49, Steven Hartland <killing at multiplay.co.uk> wrote:
>
>> On 30/07/2015 12:30, Lev Serebryakov wrote:
>>> Deduplication IS TURNED OFF. atime is turned off. Record size set to 1M as
>>> I have a lot of big files (movies, RAW photo from DSLR, etc). Compression is
>>> turned off.
>> You don't need to do that as record set size is a min not a max, if you don't force it large files will still be stored efficiently.
> Can you point to documentation for that ?
Ignore my previous comment there I was clearly having a special moment.
recordsize sets the suggested block size which is effectively the
largest block size for a given file. Its generally not about efficient
storage more efficient access, so that's what you usually want to
consider except in extreme cases.
If you set recordsize to 1MB you get large block support which is
detailed here:
https://reviews.csiden.org/r/51/
Key info from this:
Recommended uses center around improving performance of random reads of
large blocks (>= 128KB): - files that are randomly read in large chunks
(e.g. video files when streaming many concurrent streams such that
prefetch can not effectively cache data); performance will be improved
in this case because random 1MB reads from rotating disks has higher
bandwidth than random 128KB reads. - typically, performance of
scrub/resilver is improved, especially with RAID-Z
The tradeoffs to consider when using large blocks include: - accessing
large blocks tends to increase latency of all operations, because even
small reads will need to get in line benind large reads/writes -
sub-block writes (i.e. write to 128KB of a 1MB block) will incur even
larger read-modify-write penalty - the last, partially-filled block of
each file will be larger, wasting memory, and if compression is not
enabled, disk space (expected waste is 1/2 the recordsize per file,
assuming random file length)
recordsize is documented in the man page:
https://www.freebsd.org/cgi/man.cgi?query=zfs&apropos=0&sektion=8&manpath=FreeBSD+10.2-stable&arch=default&format=html
> I really hope that the 128KB default is not a minimum record size or a 1KB file will take up 128 KB of FS space.
Setting the recordsize sets the suggested block size used so if you set
1MB then the minimum size a file can occupy is 1MB even if its on a 512b
file.
> As far as I know, zfs recordsize has always, since the very beginning of ZFS under Solaris, been the MAX recrodsize, but it is also a hint and not a fixed value. ZFS will write any size records (powers of 2) from 512 bytes (4 KB in the case of an shift = 4 pool) up to recordsize. Tuning of recordsize has been frowned upon since the beginning unless you _know_ the size of your writes and they are fixed (like 8 KB database records).
>
> Also note that ZFS will fit the write to the pool in the case of RAIDz<n>, see Matt Ahrens bloig entry here: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/
Another nice article on this can be found here:
https://www.joyent.com/blog/bruning-questions-zfs-record-size
Regards
Steve
More information about the freebsd-fs
mailing list