Req: Help tracking possible kernel memory leak. - possibly filesystem related

Wed Aug 2 05:22:33 UTC 2006

On 01/08/2006, at 3:10 PM, Q wrote:

> I was wondering if someone could help point me in the direction of  
> how to go about trying to resolve what I assume to be a memory leak  
> in FreeBSD 6.x.

I have made some more progress on identifying this problem. If I drop  
to single user mode, and then unmount the volume that houses the  
database and the raw data, the accumulated memory is released. When  
it did this earlier it dropping from 870M 'active' to 7M  
'active' (according to top), just by unmounting the filesystem.

If someone familiar with the memory handling of the ufs code could  
help shed some light on what this problem might be it would be much  
appreciated.

Anyway.. off to grok some source code.

> I have two database servers, one running 6.0 the other 6.1, both  
> are running PostgreSQL. They both have 4gig of memory,  running a  
> generic kernel with the following sysctl's tweaked:
>
> kern.ipc.semmni=128
> kern.ipc.semmns=512
> kern.ipc.semume=100
> kern.ipc.semmnu=256
> kern.maxdsiz=1073741824
> kern.dfldsiz=1073741824
> kern.maxssiz=134217728
> kern.ipc.shmmax=536870912
> kern.ipc.shmall=262144
> kern.ipc.shm_use_phys=1
> net.inet.udp.maxdgram=63535
>
> They are both used for processing an extremely large amount of data  
> collected from various sources every 5 minutes and therefore  
> perform virtually identical workloads. All processing is performed  
> locally, there is only 1 external connection being made to the  
> database, once a day to retrieve a small selection of data.
>
> Both machines are showing a constantly growing 'Active' memory  
> usage in 'top' until they reach a point where the database  
> performance drop dramatically and disk IO goes through the roof. If  
> the machine is left to run in this state it appears to eventually  
> just hang (at least this is what happened to one of the machines).
>
> Most recently one of the servers (running 6.1) had an "Active  
> Memory" total of 1.6Gig, database performance was significantly  
> worse than normal, and disk io was dramatically higher. Queries  
> that previously took a few seconds were taking several minutes.
>
> Using vmstat, ps and top, along with restarting the database I was  
> unable to find anything that would indicate a user space leak  
> consuming this 1.6Gig of memory. The only way I found to free this  
> memory and ultimately restore the database performance was to  
> reboot the machine. Which resulted in the "Active memory" resetting  
> back to virtually nothing and proceeded to slowly climb again  
> (after 4 days one server is up to about 380Mb, and the other server  
> is at about 680Mb after 7 days) and growing at an almost linear rate.
>
> If someone would be so kind as to provide some advice on how to  
> track down this issue it would be much appreciated.
>
> Having to reboot these machines every 15 days is simply not a  
> viable option.

-- 
Seeya...Q

                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

                           _____  /  Quinton Dolan - q at OntheNet.com.au.
   __  __/  /   /   __/   /      /      OntheNet - Internet Provider
      /    __  /   _/    /      /        Gold Coast, QLD, Australia
   __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
                     _______  /
                             _\

-- 
Seeya...Q

                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

                           _____  /  Quinton Dolan - qdolan at gmail.com
   __  __/  /   /   __/   /      /
      /    __  /   _/    /      /        Gold Coast, QLD, Australia
   __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
                     _______  /
                             _\