More data on 7.2-RELEASE "hangs"

Wed May 13 21:51:16 UTC 2009

On Wednesday 13 May 2009 1:44:55 pm Marc G. Fournier wrote:
> On Wed, 13 May 2009, John Baldwin wrote:
> 
> > Well, you had a whole lot of page faults and other VM activity, plus 500k
> > syscalls.  The 'w' is a count of swapped processes, so basically your box is
> > swapping a whole lot it seems.  I think your box is just overloaded.
> 
> I knew I was going to regret posting that :(
> 
> What I posted was what vmstat 5 shows after the issue *starts*, not what 
> it normally looks like ... right now, after 10 hours of uptime, and all 
> the same processes running, it looks like:
> 
> io# vmstat 5 (10 hours uptime now)
>   procs      memory      page                    disks     faults         cpu
>   r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy id
>   0 1 0  10477M   301M  3503  13   1   2  3620 286   0   0  331 45491 4566 26  8 66
>   0 1 0  10430M   305M   278   7   0   0   550   0  18   0  186 19243 2917 4  3 93
>   1 1 0  10474M   295M   511   0   0   0   359   0  91   0  253 11632 3516 7  3 90
>   0 1 0  10447M   310M   819   3   0   0  1473   0  14   0  143 29575 2486 8  3 89
>   0 1 0  10558M   295M  5008  18  13   5  4128   0 121   0  345 24212 4215 16  7 77
> 
> Right now, IO is running ~775 processes ... at the time of the vmstat I 
> provided earlier, it was up to 1400 processes ... since there is only 5 
> minutes between script runs, something is causing it to go from zero swap 
> -> high swap within a very short period of time, but since things get 
> badly locked up when it happens, I can't isolate where ...
> 
> I've got the following two ps outputs at the time of the high paging:
> 
> /bin/ps -aucxHl -O jid > ps-long.out
> /bin/ps -aux -O jid > ps-short.out

Perhaps do 'sort -n -k6 < ps-short.out' to find which processes have large
virtual memory sizes?  Something is using a lot of memory and causing your
box to thrash.

-- 
John Baldwin