zfs l2arc warmup

Karl Denninger karl at denninger.net
Fri Mar 28 12:12:12 UTC 2014


On 3/28/2014 4:23 AM, Joar Jegleim wrote:
> On 28 March 2014 01:59,  <kpneal at pobox.com> wrote:
>> On Thu, Mar 27, 2014 at 11:10:48AM +0100, Joar Jegleim wrote:
>>> But it's really not a problem for me how long it takes to warm up the
>>> l2arc, if it takes a week that's ok. After all I don't plan on
>>> reboot'ing this setup very often + I have 2 servers so I have the
>>> option to let the server warmup until i hook it into production again
>>> after maintenance / patch upgrade and so on .
>>>
>>> I'm just curious about wether or not the l2arc warmup itself, or if I
>>> would have to do that manual rsync to force l2arc warmup.
>> Have you measured the difference in performance between a cold L2ARC and
>> a warm one? Even better, have you measured the performance with a cold
>> L2ARC to see if it meets your performance needs?
> No I haven't.
> I actually started using those 2 ssd's for l2arc the day before I sent
> out this mail to the list .
> I haven't done this the 'right' way by producing some numbers for
> measurement, but I do know that the way this application work today is
> that it will pull random jpegs from this dataset of about 1.6TB,
> consisting of lots of millions of files ( more than 20 million). And
> that today this pool is served from 20 SATA 7.2K disks which would be
> the slowest solution for random read access.
> Based on the huge performance gain by using ssd's simply by looking at
> the spec., but also by looking at other peoples graphs from the net (
> people who have done this more thorough than me) I'm pretty confident
> to say that if at any time when the application request a jpeg if it
> was served from either ram or ssd it would be a substantial
> performance gain compared from serving it from the 7.2k array of
> disks.
No, the simplest solution is IMHO to stop trying to RAM-back a 1.6TB 
data set through various machinations.

A cache is just that -- a cache.  It's purpose is to make *frequently 
accessed* data more-quickly available to an application.  You have the 
antithesis of cachable data in that you have a pure random access model 
with no predictive or "frequently used" means to determine what is 
likely to be requested next.

IMHO the best and cheapest way to serve that data is to eliminate 
rotational and positioning latency from the data path.  If it is a 
read-nearly-always (or read only) data set then redundancy is only 
necessary to prevent downtime (not data loss) since it be easily backed up.

For the model you describe I would buy however many SSD disks were 
necessary to store said data set, design a means to back it up reliably 
and be done with it.

Backing the data store with L2ARC (and the RAM to manage it) is likely 
self-defeating as you not only are paying for BOTH the spinning rust AND 
the SSDs but you have doubled the number of devices that can fail and 
interrupt service.

-- 
-- Karl
karl at denninger.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2711 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140328/af7fcdec/attachment.bin>


More information about the freebsd-fs mailing list