Big problems with 7.1 locking up :-(
ghelmer at palisadesys.com
Thu Feb 12 10:16:16 PST 2009
Guy Helmer wrote:
> Pete French wrote:
>> I have a number of HP 1U servers, all of which were running 7.0
>> perfectly happily. I have been testing 7.1 in it's various incarnations
>> for the last couple of months on our test server and it has performed
>> So the last two days I have been round upgrading all our servers,
>> that I had run the system stably on identical hardware for some time.
>> Since then I have starte seeing machines lock up. This always happens
>> heavy disc load. When I bring the machine back up then sometimes it
>> to fsck due to a partialy truncated inode. The locksup appear to
>> be disc related - on my mysql msater machine it will come back up with
>> files somewhat shorted than those which ahve aready been transmitted to
>> the slave (i.e. some data was in memory, and claimed to have been
>> to the drive, but never made it onto the disc).
>> The only time I have seen anything useful on the screen was during
>> one lockup
>> where I got a message about a spin lock being held too long and some
>> comment in parentheses about it being a turnstile lock.
>> Help! :-(
>> I am now downgrading all the machine to 7.0 as fast as I can - though
>> machine I am trying to compile it on has locked up once during the
>> so I havent got anywhere so far.
>> The machines are HP Proliant DL360 G5s - they have an embedded P400i
>> RAID controller with a pair of mirrored drives connected. Each one has
>> both ethernets connected, bundled using lagg and LACP.
> I can't tell whether my situation is related, but I am seeing lockups
> on SMP Supermicro servers with both older (NetBurst-ish) and current
> Xeon CPUs. I have been dropping into the kernel debugger and getting
> lock information and process backtraces, but so far nothing has been
> conclusively identified. I think the issue I'm seeing was introduced
> sometime between October 2 and November 24 in the RELENG_7 branch, and
> I suppose the next step is to do a binary search for the offending
FWIW, I think I have tracked down the changes just prior to 7.1-RELEASE
that is causing my Supermicro dual Xeon machines to wedge. I did the
binary search between 2008-10-02 and 2008-11-24 without reproducing any
lockups, and then I went on to search between 2008-11-24 and
2009-01-04. An SMP kernel build from 2008-12-22 (r186409) sources was
stable for over two weeks; a kernel built from 2008-12-29 (r186590)
sources wedged in under 24 hours under moderate load.
It appears that the significant changes between r186409 and r186590 were
r186552 (delphij - reverted ATA changes) and r186535/r186534 (delphij -
reverted bce changes). My machines don't have bce interfaces, so I
suspect the ATA changes.
Guy Helmer, Ph.D.
Chief System Architect
Palisade Systems, Inc.
More information about the freebsd-stable