Unexpected reboot/crash on 8.2-RELEASE.

Mark Johnston markj at freebsd.org
Sun May 19 03:02:47 UTC 2013


On Sat, May 18, 2013 at 09:45:21PM -0400, kpneal at pobox.com wrote:
> I had an unexpected reboot of my Dell R610 today around 2:05-06pm today.
> I do not know if it crashed or if it was power cycled.
> 
> This machine is running:
> FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec  8 21:58:59 UTC 2011     root@:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of
> vfs_mountroot() to delay before attempting to mount the root filesystem.
> (Without my tweak it attempts to mount root before the USB drive is finished
> getting attached.)
> 
> The dmesg shows this at the reboot:
> mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete
> mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started 
> mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete
> mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028)
> mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
> mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028)
> mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
> 
> Does this mean the machine did not lose power? I ask because my datacenter
> had some sort of power incident and I'm not sure if the server lost power
> or not. But if the kernel message buffer from before the incident is still
> present then the machine never lost power, correct? The datacenter's power
> incident I'm told happened somewhere around the time of the reboot so I
> have to ask.

The LSI controllers I've used will keep internal event logs which are
persistent across power cycles (so long as the BBU isn't dead,
presumably). It looks like mfi(4) has been set up to dump the entire
event log during boot. Log entries created after the last reboot are
displayed with a timestamp of "boot + Ns".

> 
> It looks like I didn't have dumps enabled. That's ... not helpful.
> 
> The machine has been stable for:
>  2:05PM  up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00

That's a bit confusing... did you mean "had been"? This is the exact
uptime that's in status.txt below.

> 
> http://www.neutralgood.org/~kpn/dmesg.boot
> 
> Here's various stats I usually keep displayed. This is the last from
> before the reboot:
> http://www.neutralgood.org/~kpn/status.txt
> 
> I've got all the power savings features turned off in the BIOS and, like
> I said, the machine has been stable for all this time. However, one thing
> to note from a couple of days ago:
> 

This is probably unrelated? As an aside, it'd be nice if mfi(4) dumped
info about the dcmd/io cmd at least once if it times out. At the moment,
it only does that if MFI_DEBUG is enabled... does anyone have an
objection to changing this from a compile-time option to a sysctl?

Thanks,
-Mark

> May 14 00:49:13 gunsight1 -- MARK --
> May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 35 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 65 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 95 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 125 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 155 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 185 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 215 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 245 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 275 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 305 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 335 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 365 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 395 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 425 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 455 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 485 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 515 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 545 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 575 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 605 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 635 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 665 SECONDS
> May 14 01:19:36 gunsight1 -- MARK --
> May 14 01:39:36 gunsight1 -- MARK --
> May 14 01:59:37 gunsight1 -- MARK --
> May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) - Patrol Read started
> 
> -- 
> Kevin P. Neal                                http://www.pobox.com/~kpn/
> "Not even the dumbest terrorist would choose an encryption program that
>  allowed the U.S. government to hold the key." -- (Fortune magazine
>     is smarter than the US government, Oct 29 2001, page 196.)
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"


More information about the freebsd-stable mailing list