Big problem still remains with 7.2-STABLE locking up

NAKAJI Hiroyuki nakaji at jp.freebsd.org
Sat Jun 6 14:33:36 UTC 2009


Hi,

I noticed, some months ago, frequent lockups on my RELENG_6 server with
ECS PM800-M2, Celeron 2.6GHz (UP), 2GB ram, ATA HDDs and 3Com NIC(xl0),
and then I gave up this old server.

Last month, I replaced this 'unstable' server to the new one with
7.2-RELEASE which worked very well until I setup it as 'a server'. The
problem began just after it started 'the services'.

My story is very similar to Pete's.
http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047487.html

I followed some instructions in the list thread. But unfortunately, the
big problem still remains. 7.2-STABLE server locks up frequently.

Help! :-(

The server is NEC Express5800 S70/SD.

o CPU: Intel(R) Celeron(R) CPU 440 @ 2.00GHz (2280.25-MHz K8-class CPU)
o 6GB RAM
o ACPI APIC Table: <NEC DT000020>
o 80GB and 250GB SATA HDDs
o http://www.heimat.gr.jp/~nakaji/localhost/dmesg.boot

The kernel configuration is:

include GENERIC
ident   HEIMAT
options MSGBUF_SIZE=81920
makeoptions     DEBUG=-g
options KDB
options DDB
options BREAK_TO_DEBUGGER
options QUOTA
options DEVICE_POLLING
options HZ=1000
options SW_WATCHDOG
options DEBUG_VFS_LOCKS
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options WITNESS_SKIPSPIN
options LOCK_PROFILING

This server runs as web server, nfs server, dhcp server, ntp server,
mail server with spam checks, ML server, usenet server and so on. From
/etc/rc.conf*, there are some "_enable" lines as shown below.

o ntpdate
o ntpd
o nfs_server
o sshd
o inetd
o named
o sendmail
o rtadvd
o watchdogd
o dhcpd
o snmpd
o apache22
o samba
o zope29
o zope210
o amavisd
o amavisd_milter
o cvsupd
o ntop
o compat6x
o munin_node
o spamd
o spamass_milter
o smartd
o mailman
o sshblock
o innd
o skkserv

>From munin's graphs, the 'resets' value in netstat is increasing while
on other 'desktops' it remains zero. Though I did not find if there is a
threshold of 'resets', when it reaches to 0.8 - 1.2 the server gets
"lockup". No ping response, no messages on cosole, no keyboard response,
and, of cource, Ctrl-Alt-Esc does not function, when it locks up. I
wonder why netstat's reset is increasing.

I had learned a workaround from other Japanese guys, that is, enabling
ichwd and running watchdogd can reboot the box when it locks up if the
box has ICH. Exactly, after about 4 hours, the box rebooted while I was
in bed last night. Watchdogd functions very well.

Advice? Thanks.
-- 
NAKAJI Hiroyuki


More information about the freebsd-stable mailing list