How does disk caching work?

Mon Apr 19 23:17:12 PDT 2004

Igor Shmukler wrote:
>>>Sorry, I shouldn't have been lazy and actually looked up the settings.
>>>Yes, those are the settings I was reffering to. Someone else had cranked
>>>them up so that the machine was maintaining about 1.7G in cache; he said
>>>that he'd noticed a reduction in disk IO when he did that. I haven't
>>>been able to see any difference in disk IO, though it seems logical that
>>>setting cache too high would hurt write caching and actually increase
>>>disk IO. It's currently set to whatever the kernel thought best, so I'll
>>>just leave it there.
>>
>>Well, I'm afraid your colleague must have been imagining things.  The 
>>cache queue ('Cache' column in 'top') is just a phase in the laundering 
>>procedure (VM page recyling) between the inactive queue ('Inact' in 
>>'top') and the free queue ('Free' in 'top').  So these variables have 
>>nothing to do with disk i/o performance.
> 
> I am not sure you are correct here. I understand things very differently.
> Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as:
> vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO.
> 
> The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue.
> 
> Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking.

Well, I didn't claim that the cache queue were useless.  It does have 
its merits.  And there is a certain default amount configured by the 
kernel's auto-scaling code already.

What I was trying to point out is that these variables don't necessarily 
do what their name suggests.  Take 'vm.v_cache_max', for example.  When 
you crank that up, instead of increasing the size of the cache queue it 
is actually the inactive queue that grows in size.

This is because the kernel steals pages from the inactive queue when it 
temporarily runs out of pages in the cache queue, without having to 
block for i/o as long as there are clean (not written to or already 
laundered) pages in the inactive queue.  When it finds dirty pages 
during this scan it schedules them for background synchronization with 
the disk, but again without blocking in the foreground.

The reason for this algorithm is that it is better to keep pages in the 
inactive queue for as long as possibe, rather than moving them over to 
the cache queue prematurely.  Pages in the inactive queue can be still 
mapped into the memory space of processes, while pages in the cache 
queue have lost this association.  So, quite naturally, when the VM 
system has to reactivate a page (put it back into the active queue) this 
operation tends to be less expensive when the page is still in the 
inactive queue.

So, for reasons like these, I keep recommending to either study the 
kernel sources before you try to tune the VM system, or leave these 
variables alone.

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini at geminix.org  |  http://www.escapebox.net