kern/104406: [ufs] Processes get stuck in "ufs" state under persistent CPU load

Wed Jun 6 20:33:40 UTC 2007

On Tue, Jun 05, 2007 at 08:50:10PM +0000, Jeffrey D. Wheelhouse wrote:
> The following reply was made to PR kern/104406; it has been noted by GNATS.
> 
> From: "Jeffrey D. Wheelhouse" <jdw at wheelhouse.org>
> To: bug-followup at FreeBSD.org
> Cc:  
> Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent
>  CPU load
> Date: Tue, 05 Jun 2007 16:26:26 -0400
> 
>  I believe we have also experienced this bug (or a very similar one) on 
>  our 8-core amd64 systems under 6.2-RELEASE-p4.
>  
>  In our case, "top" shows that the system is 100% CPU utilized, with the 
>  vast majority of it as "system" time.  (Ordinarily the system
>  
>  In the last case, we ended up with about 200 Apache processes that 
>  looked like this in ps:
>  
>     UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>  25000 27121 26860 1977  -4  5 146324 33732 ufs    DN    ??    0:03.75 httpd
>  25000 27147 37257 1994  -4  5 153748 29280 ufs    DN    ??    0:03.72 httpd
>  25000 27157 36912 1805  -4  5 150756 26592 ufs    DN    ??    0:02.91 httpd
>  25000 27224 27030 1845  -4  5 137536 24804 ufs    DN    ??    0:01.25 httpd
>  25000 27274 26794 1829  -4  5 148140 35416 ufs    DN    ??    0:02.90 httpd
>  
>  Once a process gets "stuck" in WCHAN ufs, it's blocked indefinitely, as 
>  described here, or at least so slow as to be indistinguishable from 
>  stuck.  (Typical wait channels for our httpds are accept or kqread, as 
>  one would expect.)
>  
>  Each process in this state counts against the load average, so we often 
>  see load averages north of 200 when this is occurring.  (Typical load 
>  average is below 2.)
>  
>  Kill enough processes (or possibly enough to hit the "right" process) 
>  and everything picks up again right where it left off.
>  
>  I also have no idea how to debug this.

See the Developers handbook

Kris