Problem with ufs not releasing vm_pages on busy volume. (soft updates related)

Tue Aug 8 17:52:36 UTC 2006

On 08/08/06 12:33, Shane Adams wrote:
> Perhaps its just me but I found taking a snapshot on a 80 gig drive that is relativly idle, frooze my system for several seconds (to the point I was about to give it the finger).  I figured something is broken on my end cause I couldn't imagine anyone using the feature if this is how it works normally.
> 
> Just curious if anyone would share their experiences with snapshotting in a production envioronment?

Unfortunately, that is how the snapshots work.  Larger filesystems take 
longer.  The more inodes that are used, the longer it takes.

If you end up stat'ing the directory that contains the snapshot, then it 
will block, and so will its parent, probably the /, which will stop 
everything.  You could try burying the snapshot down a few more levels, 
and see if you hang then too - I suppose you will.

Eric

> ----- Original Message ----
> From: Eric Anderson <anderson at centtech.com>
> To: Q <qdolan at gmail.com>
> Cc: freebsd-fs at freebsd.org
> Sent: Tuesday, August 8, 2006 9:07:50 AM
> Subject: Re: Problem with ufs not releasing vm_pages on busy volume. (soft updates related)
> 
> On 08/08/06 00:14, Q wrote:
>> On 02/08/2006, at 8:10 PM, Q wrote:
>>
>>> I have a problem that seems to be related to something ufs related  
>>> not releasing some vm_pages on busy filesystems. I have two servers  
>>> running PostgreSQL, one running 6.0-RELEASE, the other 6.1-RELEASE.  
>>> Both are under the same (fairly heavy) load, performing the same  
>>> operations in bursts every five minutes. The filesystems in  
>>> question are 450-500Gig, each server using a different brand of  
>>> RAID card, they both have soft-updates enabled.
>>>
>>> The problem is that both servers are seeing an accumulation of  
>>> about 100Mb of active pages per day (looking at  
>>> vm.stats.vm.v_active_count) that never get released. The only way  
>>> to release these pages is to unmount the filesystem and remount it.  
>>> Failing to do this results in the server eventually locking up.
>>>
>>> If someone could provide me with some direction on how to go about  
>>> tracking down what might be causing this to happen it would be much  
>>> appreciated.
>> I have narrowed the cause of this issue down further to something to  
>> do with soft updates. If I turn off soft updates for the filesystem  
>> hosting the database the system no longer accumulates active vm_pages  
>> constantly. Instead for accumulating 100Mb a day of active vm pages  
>> until all memory is consumed, it will hover around 50-60Mb with soft  
>> updates disabled.
>>
>> If someone familiar with the softupdates code is willing to help me  
>> pinpoint the cause of this problem it would be much appreciated.
>>
> 
> 
> Is it possible for you to upgrade to the latest 6-STABLE branch, just to 
> make sure that the issue hasn't been fixed already?
> 
> Is there any way to reproduce this on another box for testing?  (I 
> assume not, due to the nature of these things)
> 
> Also - I wonder if doing a snapshot on the filesystem would flush out 
> the pages - is that something you can try?
> 
> Eric
> 
> 

-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------