kern/104406: [ufs] Processes get stuck in "ufs" state under persistent CPU load

Tue Jun 5 20:50:11 UTC 2007

The following reply was made to PR kern/104406; it has been noted by GNATS.

From: "Jeffrey D. Wheelhouse" <jdw at wheelhouse.org>
To: bug-followup at FreeBSD.org
Cc:  
Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent
 CPU load
Date: Tue, 05 Jun 2007 16:26:26 -0400

 I believe we have also experienced this bug (or a very similar one) on 
 our 8-core amd64 systems under 6.2-RELEASE-p4.

 In our case, "top" shows that the system is 100% CPU utilized, with the 
 vast majority of it as "system" time.  (Ordinarily the system

 In the last case, we ended up with about 200 Apache processes that 
 looked like this in ps:

    UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
 25000 27121 26860 1977  -4  5 146324 33732 ufs    DN    ??    0:03.75 httpd
 25000 27147 37257 1994  -4  5 153748 29280 ufs    DN    ??    0:03.72 httpd
 25000 27157 36912 1805  -4  5 150756 26592 ufs    DN    ??    0:02.91 httpd
 25000 27224 27030 1845  -4  5 137536 24804 ufs    DN    ??    0:01.25 httpd
 25000 27274 26794 1829  -4  5 148140 35416 ufs    DN    ??    0:02.90 httpd

 Once a process gets "stuck" in WCHAN ufs, it's blocked indefinitely, as 
 described here, or at least so slow as to be indistinguishable from 
 stuck.  (Typical wait channels for our httpds are accept or kqread, as 
 one would expect.)

 Each process in this state counts against the load average, so we often 
 see load averages north of 200 when this is occurring.  (Typical load 
 average is below 2.)

 Kill enough processes (or possibly enough to hit the "right" process) 
 and everything picks up again right where it left off.

 I also have no idea how to debug this.

 Thanks,
 Jeff

 -- 
 Jeff Wheelhouse
 jdw at wheelhouse.org