amr lockup on 8.0-RELEASE

Andrew Hood andrew.hood at lynchpin.com
Mon Mar 8 15:00:19 UTC 2010


Hi,

Recently upgraded to 8.0-RELEASE-p2 (amd64) on a dual-processor Opteron 
system with a LSI MegaRAID SCSI 320-1.

Since then, am getting a complete lock-up of the disk subsystem under 
heavy write load.

It copes fine with a kernel build, but an attempt to rsync 150GB or so 
of data from the machine it is supposed to be replacing routinely hangs.

I can systematically (and pretty immediately) recreate the issue using 
/usr/ports/sysutils/stress with one hdd hog (stress -d 1).

When the hang occurs, the load average gradually moves up to 0.99 with 
the following CPU states shown in top:

CPU:  0.0% user,  0.0% nice,  0.0% system, 25.0% interrupt, 75.0% idle

I'm guessing 25% is expressed as a proportion of 4 processor cores (2 x 
dual cores)?

If I run top -S, I can see one interrupt handler (?) at 100%

12 root       20 -60    -     0K   320K WAIT    0   0:08 100.00% intr

 From that point, the machine will happily do anything that doesn't 
involve reading or writing to disk. Anything attempting to access the 
disk subsystem will just hang indefinitely. Killing the process that was 
attempting to access this disk does not restore things.

No errors at all in syslog or on the console.

Machine had previously been running quite happily on 6.2-RELEASE as a 
PostgreSQL server without any issues; but equally may not have been as 
heavily loaded.

Not quite sure where to look next in terms of further diagnosis, 
wondered if anyone had experienced anything similar?

Thanks,
Andrew

-- 
Andrew Hood
Managing Director
Lynchpin Analytics
t: 0845 838 1136
f: 0845 838 1137
e: andrew.hood at lynchpin.com

Lynchpin Analytics Limited is registered in Scotland No. SC279857
Registered Office: 5th Floor, 7 Castle Street, Edinburgh, EH2 3AH


More information about the freebsd-hardware mailing list