Big problems with 7.1 locking up :-(

Robert Watson rwatson at FreeBSD.org
Thu Jan 29 14:38:58 PST 2009


On Fri, 9 Jan 2009, Pete French wrote:

> I have a number of HP 1U servers, all of which were running 7.0 perfectly 
> happily. I have been testing 7.1 in it's various incarnations for the last 
> couple of months on our test server and it has performed perfectly.
>
> So the last two days I have been round upgrading all our servers, knowing 
> that I had run the system stably on identical hardware for some time.

For those following this other than Pete, who I've been in private 
correspondence with: it seems that he is running into two different deadlocks 
in the routing code.  One of them (at least) is triggered by a lock order 
problem relating to the processing of ICMP redirects -- uncommon in most 
configurations, but quite a few on his network, which triggers quickly under 
load.  Kip Macy has corrected at least one (both?) problems in head, and plans 
to MFC the fixes in the near future.  We'll follow up further once the fixes 
are merged, and if any further problems transpire.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> Since then I have starte seeing machines lock up. This always happens under
> heavy disc load. When I bring the machine back up then sometimes it fails
> to fsck due to a partialy truncated inode. The locksup appear to
> be disc related - on my mysql msater machine it will come back up with
> files somewhat shorted than  those which ahve aready been transmitted to
> the slave (i.e. some data was in memory, and claimed to have been written
> to the drive, but never made it onto the disc).
>
> The only time I have seen anything useful on the screen was during one lockup
> where I got a message about a spin lock being held too long and some
> comment in parentheses about it being a turnstile lock.
>
> Help! :-(
>
> I am now downgrading all the machine to 7.0 as fast as I can - though the
> machine I am trying to compile it on has locked up once during the compile
> so I havent got anywhere so far.
>
> The machines are HP Proliant DL360 G5s - they have an embedded P400i
> RAID controller with a pair of mirrored drives connected. Each one has
> both ethernets connected, bundled using lagg and LACP.
>
> Advice ?
>
> -pete.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>


More information about the freebsd-stable mailing list