vmstat 'b' (disk busy?) field keeps climbing ...
Kostik Belousov
kostikbel at gmail.com
Sat Jun 24 19:56:29 UTC 2006
On Sat, Jun 24, 2006 at 04:45:49PM -0300, Marc G. Fournier wrote:
> On Sat, 24 Jun 2006, Kostik Belousov wrote:
>
> >On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote:
> >>On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote:
> >>>On Sat, 24 Jun 2006, Kostik Belousov wrote:
> >>>
> >>>>On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote:
> >>>>>On Sat, 24 Jun 2006, Marc G. Fournier wrote:
> >>>>>
> >>>>>MGF> > 'b' stands for "blocked", not "busy". Judging by your page
> >>>>>fault
> >>>>>rate
> >>>>>MGF> > and the high number of frees and pages being scanned, you're
> >>>>>probably
> >>>>>MGF> > swapping tasks in and out and are waiting on disk. Take a look
> >>>>>at
> >>>>>MGF> > "vmstat -s", and consider adding more RAM if this is correct...
> >>>>>MGF>
> >>>>>MGF> is there a way of finding out what processes are blocked?
> >>>>>
> >>>>>Aren't they in 'D' status by ps?
> >>>>Use ps axlww. In this way, at least actual blocking points are shown.
> >>>
> >>>'k, stupid question then ... what am I searching for?
> >>>
> >>># ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr
> >>> 654 select
> >>> 230 lockf
> >>> 166 wait
> >>> 85 -
> >>> 80 piperd
> >>> 71 nanslp
> >>> 33 kserel
> >>> 22 user
> >>> 10 pause
> >>> 9 ttyin
> >>> 5 sbwait
> >>> 3 psleep
> >>> 3 accept
> >>> 2 kqread
> >>> 2 Giant
> >>> 1 vlruwt
> >>> 1 syncer
> >>> 1 sdflus
> >>> 1 ppwait
> >>> 1 ktrace
> >>> 1 MWCHAN
> >>>
> >>>According to vmstat, I'm holding at '4 blocked' for the most part ...
> >>>sbwwait is socket related, not disk ... and none of the others look right
> >>>...
> >>I would say, using big magic cristall ball, that you problems are
> >>not kernel-related. I see only too suspicious points:
> >>
> >>1. high number of pipe readers and waiters for file locks. It may be
> >>normal for your load.
> >>
> >>2. 2 Giant holders/lockers. Is it constant ? Are the processes
> >>holding/waiting
> >>for Giant are the same ?
> >>
> >>Anyway, being in your shoes, I would start looking at applications.
> >>
> >>Ah, and does dmesg show anything ?
> >
> >And another question: what are the processes in the state "user" ?
> >I never see that state. More, search thru the sources does not show
> >what this could be.
>
> Odd, I'm not finding any, but, I did get a Giant on a grep of the ps
> listing::
>
> pluto# ps axlww | grep " user "
> 0 93055 46540 0 96 0 348 212 Giant L+ p4 0:00.00 grep
> user
>
> Not sure where those 'user' came from though ... just ran the above again:
>
> # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr
> 603 select
> 231 lockf
> 71 nanslp
> 33 -
> 30 kserel
> 23 wait
> 9 ttyin
> 9 sbwait
> 7 pause
> 6 accept
> 4 piperd
> 3 psleep
> 3 kqread
> 3 Giant
> 1 syncer
> 1 sdflus
> 1 ppwait
> 1 pgzero
> 1 ktrace
> 1 MWCHAN
>
> And nothing ...
>
> Got a Giant lock on sshd too?
>
> pluto# ps axlww | grep Giant
> 0 693 556 1 96 0 6096 2080 Giant Ls ?? 0:02.18 sshd:
> root at ttyp0 (sshd)
> 0 94334 46540 0 96 0 348 208 - R+ p4 0:00.00 grep
> Giant
Everything looks normal, transient Giant aquire/contention is quite normal,
esp. when you have several Giant-locked kernel parts.
I strongly suggest to move point of investigation to the application(s)
itself. Kernel seems to be innocent.
[Deadlock due to disk driver/Giant/fs immediately shows as HUGE number
of processes in D state with completely different set of wait states.
All your processes do select/wait for file lock/read from pipe/something
threaded.]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060624/e810c3a8/attachment.pgp
More information about the freebsd-stable
mailing list