How does disk caching work?

Tue Apr 20 08:11:00 PDT 2004

Igor Shmukler wrote:
>>What I was trying to point out is that these variables don't necessarily 
>>do what their name suggests.  Take 'vm.v_cache_max', for example.  When 
>>you crank that up, instead of increasing the size of the cache queue it 
>>is actually the inactive queue that grows in size.
>>
>>This is because the kernel steals pages from the inactive queue when it 
>>temporarily runs out of pages in the cache queue, without having to 
>>block for i/o as long as there are clean (not written to or already 
>>laundered) pages in the inactive queue.  When it finds dirty pages 
>>during this scan it schedules them for background synchronization with 
>>the disk, but again without blocking in the foreground.
>>
>>The reason for this algorithm is that it is better to keep pages in the 
>>inactive queue for as long as possibe, rather than moving them over to 
>>the cache queue prematurely.  Pages in the inactive queue can be still 
>>mapped into the memory space of processes, while pages in the cache 
>>queue have lost this association.  So, quite naturally, when the VM 
>>system has to reactivate a page (put it back into the active queue) this 
>>operation tends to be less expensive when the page is still in the 
>>inactive queue.
> 
> While you are correct that when cache is emtry kenrel will dip into the inactive queue. You are mistaken about other things.  Pages on the cache queue still have the association. I wrote that one of the previous posts.
> 
> To sum it up: cache queue is same as inactive queue except it has only clean pages.
> 
> If things were the you suggest, cache queue would be totally useless.

I think you're mixing up two different things here.  The way I 
understand the kernel sources, the pages in the cache queue of course 
still have their association with the underlying VM object.  Otherwise 
caching these pages would be useless.  But they are no longer mapped 
into any process address space.  If I may quote the relevant comment 
from vm_page_cache():

         /*
          * Remove all pmaps and indicate that the page is not
          * writeable or mapped.
          */

vm_page_cache() is the function that moves the pages from the inactive 
to the cache queue once they are clean.  Restoring the process address 
space mapping is what makes reactivating pages from the cache queue more 
expensive than just relinking them from the inactive queue, because a 
fault gets generated when the process tries to access the page.  This 
fault then maps the page from the VM object into the process address 
space.  This causes additional overhead.

> I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free.

You may want to study the kernel sources some more, I'm afraid.

>>So, for reasons like these, I keep recommending to either study the 
>>kernel sources before you try to tune the VM system, or leave these 
>>variables alone.
> 
> I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available.

Sorry, but you just proved my point ...

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini at geminix.org  |  http://www.escapebox.net