Stability Issues on 5.4-RELEASE Box

DAve dave.list at pixelhammer.com
Wed Feb 28 21:39:25 UTC 2007


alex at schnarff.com wrote:
> Hello All,
> 
> I've recently fallen into the task of administering a FreeBSD 
> 5.4-RELEASE box that acts as the web server for a small non-profit that 
> I volunteer for. Unfortunately, the system has been having some 
> extremely vexing stability issues over the last month or so, which even 
> my 6+ years of experience as an OpenBSD admin have not helped me track 
> down.
> 
> First things first, let me say explicitly that I'm not trying to say 
> "FreeBSD sucks, it's not stable" or anything like that. It's a fine OS, 
> and I'm sure that it's either faulty hardware or a misconfiguration of 
> some sort causing these problems. :-)
> 
> That said, here are some of the symptoms the box has been experiencing:
> 
> * Occasional random reboots. I've only personally witnessed one, and 
> they don't happen often, but any time a *NIX box just reboots for no 
> apparent reason (there was no indication of a problem in any of the 
> logs, at least that I could see), something really bad is going on.
> 
> * Random extreme slowness when logging in via SSH, with the time to get 
> a shell ranging from a second or two all the way up to 80 seconds. The 
> box isn't busy enough that it's just slow due to load (especially since, 
> once you're in, things fly), and it's not just a reverse DNS issue like 
> I've seen on OpenBSD (this occurs even when logging in from locations 
> listed in /etc/hosts that resolve properly out of that file). Until I 
> upgraded to the current version of OpenSSL/OpenSSH, the box would 
> occasionally just become unresponsive altogether over SSH, not allowing 
> logins for 15+ minutes at a time.
> 
> * Issues with files that are not found on startup sometimes, but are 
> other times. Prime example: the Zope CMS system that's been installed 
> failed to find libmysqlclient.so after a planned soft reboot, but found 
> it with no trouble on a subsequent boot a few minutes later, with no 
> config changes in between.
> 
> * A warning in /var/log/messages that the root filesystem was full, when 
> it was at 60% capacity (and something like 2% inode capacity); the 
> problem has yet to repeat, though no files have been cleared off of that 
> filesystem.
> 
> * Random crashes of the Zope/Plone system that's running the main part 
> of the web site. While I realize that, in and of itself, this means 
> nothing about the stability of the underlying OS, in the context of all 
> of the other things going on (as well as the fact that the Zope list has 
> been unable to help figure out why it's crashing), it seems like it 
> might be further evidence of a larger problem.
> 
> Thus far, besides simply scanning log files, constantly watching "top" 
> and "ps", etc., I've not been able to do much with the box. As I said, I 
> upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the 
> firewall (there was none before I arrived...don't even get me started on 
> that). This weekend the guy who was the previous admin will be running a 
> Memtest for me and disabling hyperthreading (which there's no 
> performance justification for, and which has caused me stability issues 
> at least on Linux in the past), since the server is in Oregon and I'm in 
> the DC area. That's about the extent of what I've been able to do to 
> date, since this is a production box.
> 
> What I'd like to know from you guys is:
> 
> * Am I justified in suspecting hyperthreading as a potential cause of 
> instability?
> 
> * Does 5.4-RELEASE have any known bugs that might cause stability issues 
> like the ones I've described here? More importantly, would an upgrade to 
> 6.2-RELEASE be worthwhile (as is my instinct), in terms of being 
> generally more stable and/or having better hardware support? Would such 
> an upgrade be possible/relatively painless to perform without being 
> physically at a console, as has been the case with OpenBSD over the years?
> 
> * Given my dmesg below, do you see any specific problems?
> 
> * Do you have any other suggestions for debugging this problem?
> 
> Thanks in advance for any help you can provide. :-)
> 
> Alex Kirk

I would certainly think hardware is the place to look.

Just so you know, we still run a server on FBSD 4.8, and it runs very 
well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple 
Linux, NetBSD, and Solaris boxen too.

I prefer not to chase versions on high load production equipment, 
certainly not as a problem resolution strategy. For the record, I have 
never had an blind upgrade fix an unidentified problem, and if it did I 
would be very worried.

I would guess memory, at least that is where I would look first. I would 
also wonder what environment the server runs in, heat is a killer, so is 
vibration. Loose racks and humming floors can and will cause connections 
to slip. I have fixed servers that ran for months and suddenly showed 
odd behavior simply by powering down and removing all cards/ram/cables, 
then reattaching everything.

Mysterious failures, 3000 miles to the console, I don't envy you ;^)

DAve


-- 
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


More information about the freebsd-questions mailing list