6.2-STABLE deadlock?

Kris Kennaway kris at obsecurity.org
Wed Apr 25 03:43:05 UTC 2007


On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote:
> LI Xin wrote:
> > Kostik Belousov wrote:
> > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> > >> On Tue, Mar 13, 2007 at 02:08:48PM +0000, Adrian Wontroba wrote:
> > >>> At work, amoungst my stable of old computers running 
> > FreeBSD, I have a
> > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> > >>> primarily runs Nagios and a small and lightly used MySQL 
> > database, along
> > >>> with a few inbound FTP transfers per minute. It has a 
> > Mylex card based
> > >>> disc subsystem, ruling out crash dumps.
> > >>>
> > >>> At some point during 5.5-STABLE this machine started to 
> > occasionally hang ...
> > >> Another 6-STABLE (cvsupped on 27/03/07) example, with 
> > diagnostics taken
> > >> rather sooner after the hang.  Processes with wmesg=ufs 
> > feature often in
> > >> the ps output.
> > >>
> > >> http://www.stade.co.uk/crash1/
> > > 
> > > I would suspect the mlx controller. There is several 
> > processes (for instance,
> > > 988, 50918) waiting for completion of block read, and 
> > processes in the "ufs"
> > > states are the result of the lock cascade, IMHO.
> > 
> > I'm not very sure if this is specific to one disk controller. 
> >  Actually
> > I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> > (slightly patched version) that most of processes stuck in the 'ufs'
> > state, under very light load, the box was equipped with amr(4) RAID.
> > 
> > I was not able to reproduce the problem at my lab, though, it's still
> > unknown that how to trigger the livelock :-(  Still need some
> > investigate on their production system.
> 
> I have seen something similar once, on a machine with an Areca (arcmsr)
> controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
> "ufs", and the machine needed physical intervention to reboot.  I haven't
> seen it since.  From memory, it happened during startup of the applications
> and jails on the machine.

Sounds like one of the known unionfs bugs.

Kris


More information about the freebsd-stable mailing list