What caused my box to die?

Adam blueeskimo at gmx.net
Thu May 29 12:49:03 PDT 2003

Well, I'm just about done rebuilding my box after a major crash and
burn. Now that things are slowing down, I'd like to get some input on
what might have caused the crash. I'm wondering if I stumbled on some
obscure bug, or maybe a known bug that hasn't ever been fixed. Let me
describe the scenario before and after the crash:

I had 3 machines on my LAN at the time. The network config is simple:
DSL Modem --> Gateway (also my workstation) --> Switch --> Laptop and
test machine

My gateway was running FreeBSD 4.7. Just before the crash, I was copying
movies via FTP from the gateway to the laptop. At the same time, I was
copying a single movie from the test machine to the gateway. I looked
over at the test machine and noticed the transfer had died, and tried to
reestablish connection with the gateway. The LAN link was dead on both
the test machine and the laptop, which I found to be extremely strange
(this had never happened before).

So, I went over to the gateway (which was still running), and I decided
to check to see if the gateway was even on the internet. I typed 'ping
ftp.cdrom.com' and pressed Enter. IMMEDIATELY after I pressed Enter, the
box completely froze up. I waited about 60 seconds to see if it would
unfreeze, but it was no use. I hard reset the box, and upon reboot the
machine was unable to boot from /. In addition, several of my partitions
on ad0 were completely hosed.

At this point I got pissed off and just powered down all machines and
decided to wait until the next morning to mess with fixing anything.
However, the next morning, the test machine wouldn't even turn on. This
is the 2nd time its done that, so I have a feeling it's just a bad
coincidence. After removing the internal cables and plugging them back
in, the machine started back up and worked fine.

The only change to the machine that I can think of is I added 1gb of
Kingston Value RAM to the box a few days prior. However, the RAM had
been working perfectly for a few days before the crash, and for a few
days since I rebuilt the machine. Is there a port to thoroughly examine
the RAM and make sure its OK?

The fact that I had 2 large file transfers going at the time of the
problem really seems strange to me, because thats the first time I had
even done that with this gateway (the laptop is brand new). Perhaps
something went wrong with my ipf/ipnat? 

Anyhow, if you took the time to read this entire message and have some
theories, I'm all ears. I'd like to do whatever I can to avoid this
problem in the future. I'm now running FreeBSD 4.8 on the box with a
custom kernel. Seems to be working perfectly.

