How does disk caching work?

Tue Apr 20 07:15:58 PDT 2004

> >>>Sorry, I shouldn't have been lazy and actually looked up the settings.
> >>>Yes, those are the settings I was reffering to. Someone else had cranked
> >>>them up so that the machine was maintaining about 1.7G in cache; he said
> >>>that he'd noticed a reduction in disk IO when he did that. I haven't
> >>>been able to see any difference in disk IO, though it seems logical that
> >>>setting cache too high would hurt write caching and actually increase
> >>>disk IO. It's currently set to whatever the kernel thought best, so I'll
> >>>just leave it there.
> >>
> >>Well, I'm afraid your colleague must have been imagining things.  The 
> >>cache queue ('Cache' column in 'top') is just a phase in the laundering 
> >>procedure (VM page recyling) between the inactive queue ('Inact' in 
> >>'top') and the free queue ('Free' in 'top').  So these variables have 
> >>nothing to do with disk i/o performance.
> > 
> > I am not sure you are correct here. I understand things very differently.
> > Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as:
> > vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO.
> > 
> > The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue.
> > 
> > Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking.
> 
> Well, I didn't claim that the cache queue were useless.  It does have 
> its merits.  And there is a certain default amount configured by the 
> kernel's auto-scaling code already.

Yes, kernel defaults for queue sizes should work for most of us.

> What I was trying to point out is that these variables don't necessarily 
> do what their name suggests.  Take 'vm.v_cache_max', for example.  When 
> you crank that up, instead of increasing the size of the cache queue it 
> is actually the inactive queue that grows in size.
> 
> This is because the kernel steals pages from the inactive queue when it 
> temporarily runs out of pages in the cache queue, without having to 
> block for i/o as long as there are clean (not written to or already 
> laundered) pages in the inactive queue.  When it finds dirty pages 
> during this scan it schedules them for background synchronization with 
> the disk, but again without blocking in the foreground.
> 
> The reason for this algorithm is that it is better to keep pages in the 
> inactive queue for as long as possibe, rather than moving them over to 
> the cache queue prematurely.  Pages in the inactive queue can be still 
> mapped into the memory space of processes, while pages in the cache 
> queue have lost this association.  So, quite naturally, when the VM 
> system has to reactivate a page (put it back into the active queue) this 
> operation tends to be less expensive when the page is still in the 
> inactive queue.

While you are correct that when cache is emtry kenrel will dip into the inactive queue. You are mistaken about other things.  Pages on the cache queue still have the association. I wrote that one of the previous posts.

To sum it up: cache queue is same as inactive queue except it has only clean pages.

If things were the you suggest, cache queue would be totally useless.

I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free.

> So, for reasons like these, I keep recommending to either study the 
> kernel sources before you try to tune the VM system, or leave these 
> variables alone.

I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available.