zfs l2arc warmup

Thu Mar 27 15:45:23 UTC 2014

it seems like this should be easier. The arc and l2 will hold what has 
been read.. I don't know, maybe cat the jpegs at boot?

On 3/27/14, 9:53 AM, Karl Denninger wrote:
>
> On 3/27/2014 9:26 AM, Bob Friesenhahn wrote:
>> On Thu, 27 Mar 2014, Joar Jegleim wrote:
>>> Is this how 'you' do it to warmup the l2arc, or am I missing 
>>> something ?
>>>
>>> The thing is with this particular pool is that it serves somewhere
>>> between 20 -> 30 million jpegs for a website. The front page of the
>>> site will for every reload present a mosaic of about 36 jpegs, and the
>>> jpegs are completely randomly fetched from the pool.
>>> I don't know what jpegs will be fetched at any given time, so I'm
>>> installing about 2TB of l2arc ( the pool is about 1.6TB today) and I
>>> want the whole pool to be available from the l2arc .
>>
>> Your usage pattern is the opposite of what the ARC is supposed to do. 
>> The ARC is supposed to keep most-often accessed data in memory (or 
>> retired to L2ARC) based on access patterns.
>>
>> It does not seem necessary for your mosaic to be truely random across
>> 20 -> 30 million jpegs.  Random across 1000 jpegs which are circulated
>> in time would produce a similar effect.
>>
>> The application building your web page mosiac can manage which files 
>> will be included in the mosaic and achieve the same effect as a huge 
>> cache by always building the mosiac from a known subset of files. The 
>> 1000 jpegs used for the mosaics can be cycled over time from a random 
>> selection, with old ones being removed.  This approach assures that 
>> in-memory caching is effective since the same files will be requested 
>> many times by many clients.
>>
>> Changing the problem from an OS-oriented one to an 
>> application-oriented one (better algorithm) gives you more control 
>> and better efficiency.
>>
>> Bob
> That's true, but the other option if he really does want it to be 
> random across the entire thing, given the size (which is not 
> outrageous) and that the resource is going to be read-nearly-only, is 
> to put them on SSDs and ignore the L2ARC entirely.  These days that's 
> not a terribly expensive answer as with a read-mostly-always 
> environment you're not going to run into a rewrite life-cycle problem 
> on rationally-priced SSDs (e.g. Intel 3500s).
>
> Now an ARC cache miss is not all *that* material since there is no 
> seek or rotational latency penalty.
>
> HOWEVER, with that said it's still expensive compared against rotating 
> rust for bulk storage, and as Bob noted a pre-select middleware 
> process would result in no need for a L2ARC and allow the use of a 
> pool with much-smaller SSDs for the actual online retrieval function.
>
> Whether the coding time and expense is a good trade against the lower 
> hardware cost to do it the "raw" way is a fair question.
>