Complete hang during boot at boot2 prompt
noc at hdk5.net
Wed Jul 11 17:34:04 UTC 2007
Feargal Reilly wrote:
>I have a server which went down overnight, and
>would not subsequently boot. A reboot was performed by
>facilities staff before I got to look at it so I don't know what
>was showing on the console. The reason for the outage is
>unknown, and nothing showed in /var/log/messages, other than
>routine ntpd time sync messages.
>The server in question is a Intel SR1425BK1 server running
>FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array
>provided by an onboard LSILogic MegaRAID controller.
>When booted, it would pass the various BIOS screens without
>problem, the RAID utility would say that the array was optimal,
>and then FreeBSD would start to boot, but it couldn't get past
>At this point, the server emitted a single continous beep, and
>nothing else happened. Keyboard input did nothing, although
>Ctrl-Alt-Del still worked, and at one point a heart symbol
>appeared after I hit keys randomly for a while.
>My question is, what could have caused this failure?
>My initial guesses were either a memory failure or a really
>badly corrupted boot sector, but I'm not convinced by either
>explanation, for reasons outlined below.
>I urgently needed the data to be online again, so I yanked one
>disk out of the machine and inserted it into another host, and
>took the server back to the office.
>There, I yanked a memory module, and it booted fine, albeit
>complaining about the degraded RAID array. However, when I
>reinserted the memory, it continued to boot. I didn't have the
>foresight to try it before I fiddled with the disks, but I can't
>imagine that it had been seated incorrectly as the server had
>been up for two months without problem. Also, the BIOS tests
>passed, although I know they aren't too in depth. I'll run
>sysutils/memtest anyway, and see what that throws up.
>Meanwhile, I inserted a replacement disk and rebuilt the RAID-1
>array, and it is still booting fine, so my best guess now is a
>corrupted boot sector. The disk that I removed to insert into
>another host was ad4, which I'm guessing is the disk that it
>would have being trying to boot from in the first place. So a
>bad sector could be responsible, but it would seem to be very
>convenient, as there does not appear to be any other data
>corruption on the disk.
>Also, I've run a short SMART test, and everything is okay as far
>as it is concerned. I'm in the process of running a long test,
>but that won't finish before I leave the office. If it were a
>corrupted sector, would it be able to get to boot2?
>Any other suggestions as to what caused the failure? I know I've
>changed the conditions and may never be able to reproduce it
>(nor do I want to), but if I've failing hardware, I'd like a
>best guess as to where it is.
>Thanks for your time,
I have had memory chips walk out of the slots on several occasions.
Sometimes its vibration or in Hawaii we have humidity issues
occasionally that tend to cause this too.
I have learned to spray the sockets and card connections with contact
cleaner about every 6 months to avaid this problem. Especially in areas
where servers are not in a cool environment.
~Al Plant - Honolulu, Hawaii - Phone: 808-284-2740
+ http://hawaiidakine.com + http://freebsdinfo.org + noc at hdk5.net +
+ http://internetohana.org - Supporting - FreeBSD 6.* - 7.* +
"All that's really worth doing is what we do for others."- Lewis Carrol
More information about the freebsd-questions