random hangs/reboots with Dell servers
Chuck Swiger
cswiger at mac.com
Thu Apr 19 17:43:19 UTC 2007
On Apr 19, 2007, at 3:54 AM, Dimitris Zilaskos wrote:
> Over the last 3 year we have installed freebsd 5.x and 6.x, with
> currently deployed version being 6.1, to a variety of of Dell rack
> mounted systems.
>
> The Dell systems used so far are Poweredge 1750, 2950 (both scsi),
> and sc1425 (sata). All of them are dual CPU Xeon systems.
I've got a large number of Dell PowerEdge 1750, 1850, 2900, 2950
deployed in various production environments, whereas some other
clients are using HP ProLiant 360/370 boxen. Both seem to be rock
solid under either 5.4/5.5, or 6.1/6.2. I've even got a pair of
firewall boxes running nothing but NAT and SSHd, which are at 600+
days of uptime:
FreeBSD 5.4-STABLE (FW) #0: Tue Jul 12 11:10:14 EDT 2005
Welcome to FreeBSD!
12:24PM up 636 days, 19:26, 3 users, load averages: 0.25, 0.14, 0.04
(Machines running more services get OS or service related updates
more frequently-- typically every month to every 3 months-- but I
don't like to make changes to a running machine unless I expect the
change to make an improvement which justifies the disruption. For a
non-SMP firewall which would involve loss of external network
connectivity to update, nothing in 6.x is worth the cost to update to
as yet, IMHO.)
> All these systems serve as mail/web servers, with 2 to 15 jails.
>
> Installation has always proceeded normally without problems.
> However, after a few months of operation, all of these systems,
> purchased at different moments during the last 3 years, will begin
> rebooting randomly or freezing completely.
>
> These reboots/freezes will at first occur once per 6 months, then
> gradually will move to to once per month, to normally stabilize
> around once per week, but in the case of the 1750 system once it
> even happened twice a day.
>
> Load does not seem to matter, since even after shutting down all
> services in the servers, still random reboots occured.
Sounds to be something hardware-related like a power-supply problem,
if the failure rate is gradually getting shorter and is not
correlated with load at all.
> So far we tried various tricks digged from the archives, like
> disabling ACPI, HT, but nothing changed.
>
> We have migrated some systems that had these issues to RHEL
> compatible OS, and they run rock solid under heavy load.
Hmm. Well, you might have to wait for a few weeks or months to be
able to get reasonable comparison of longer-term stability, but this
at least implies that something like cooling or a failed fan aren't
likely causes.
> Right now I have enabled kernel crash dumps and I am waiting for
> the next crash. But I understad a lot of people use FreeBSD with
> Dell servers, and I would like to listen on how to tackle this
> situation we are facing.
Try to get a crash dump. Also, you might find reviewing the BIOS
options and disabling everything which is not needed, hopefully
including USB, will help.
--
-Chuck
More information about the freebsd-questions
mailing list