Stability Issues on 5.4-RELEASE Box
dave.list at pixelhammer.com
Wed Feb 28 21:39:25 UTC 2007
alex at schnarff.com wrote:
> Hello All,
> I've recently fallen into the task of administering a FreeBSD
> 5.4-RELEASE box that acts as the web server for a small non-profit that
> I volunteer for. Unfortunately, the system has been having some
> extremely vexing stability issues over the last month or so, which even
> my 6+ years of experience as an OpenBSD admin have not helped me track
> First things first, let me say explicitly that I'm not trying to say
> "FreeBSD sucks, it's not stable" or anything like that. It's a fine OS,
> and I'm sure that it's either faulty hardware or a misconfiguration of
> some sort causing these problems. :-)
> That said, here are some of the symptoms the box has been experiencing:
> * Occasional random reboots. I've only personally witnessed one, and
> they don't happen often, but any time a *NIX box just reboots for no
> apparent reason (there was no indication of a problem in any of the
> logs, at least that I could see), something really bad is going on.
> * Random extreme slowness when logging in via SSH, with the time to get
> a shell ranging from a second or two all the way up to 80 seconds. The
> box isn't busy enough that it's just slow due to load (especially since,
> once you're in, things fly), and it's not just a reverse DNS issue like
> I've seen on OpenBSD (this occurs even when logging in from locations
> listed in /etc/hosts that resolve properly out of that file). Until I
> upgraded to the current version of OpenSSL/OpenSSH, the box would
> occasionally just become unresponsive altogether over SSH, not allowing
> logins for 15+ minutes at a time.
> * Issues with files that are not found on startup sometimes, but are
> other times. Prime example: the Zope CMS system that's been installed
> failed to find libmysqlclient.so after a planned soft reboot, but found
> it with no trouble on a subsequent boot a few minutes later, with no
> config changes in between.
> * A warning in /var/log/messages that the root filesystem was full, when
> it was at 60% capacity (and something like 2% inode capacity); the
> problem has yet to repeat, though no files have been cleared off of that
> * Random crashes of the Zope/Plone system that's running the main part
> of the web site. While I realize that, in and of itself, this means
> nothing about the stability of the underlying OS, in the context of all
> of the other things going on (as well as the fact that the Zope list has
> been unable to help figure out why it's crashing), it seems like it
> might be further evidence of a larger problem.
> Thus far, besides simply scanning log files, constantly watching "top"
> and "ps", etc., I've not been able to do much with the box. As I said, I
> upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the
> firewall (there was none before I arrived...don't even get me started on
> that). This weekend the guy who was the previous admin will be running a
> Memtest for me and disabling hyperthreading (which there's no
> performance justification for, and which has caused me stability issues
> at least on Linux in the past), since the server is in Oregon and I'm in
> the DC area. That's about the extent of what I've been able to do to
> date, since this is a production box.
> What I'd like to know from you guys is:
> * Am I justified in suspecting hyperthreading as a potential cause of
> * Does 5.4-RELEASE have any known bugs that might cause stability issues
> like the ones I've described here? More importantly, would an upgrade to
> 6.2-RELEASE be worthwhile (as is my instinct), in terms of being
> generally more stable and/or having better hardware support? Would such
> an upgrade be possible/relatively painless to perform without being
> physically at a console, as has been the case with OpenBSD over the years?
> * Given my dmesg below, do you see any specific problems?
> * Do you have any other suggestions for debugging this problem?
> Thanks in advance for any help you can provide. :-)
> Alex Kirk
I would certainly think hardware is the place to look.
Just so you know, we still run a server on FBSD 4.8, and it runs very
well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple
Linux, NetBSD, and Solaris boxen too.
I prefer not to chase versions on high load production equipment,
certainly not as a problem resolution strategy. For the record, I have
never had an blind upgrade fix an unidentified problem, and if it did I
would be very worried.
I would guess memory, at least that is where I would look first. I would
also wonder what environment the server runs in, heat is a killer, so is
vibration. Loose racks and humming floors can and will cause connections
to slip. I have fixed servers that ran for months and suddenly showed
odd behavior simply by powering down and removing all cards/ram/cables,
then reattaching everything.
Mysterious failures, 3000 miles to the console, I don't envy you ;^)
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Maybe they forgot who made that choice possible.
More information about the freebsd-questions