Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver)

John jwd at FreeBSD.org
Mon Apr 22 03:00:53 UTC 2013


Hi Folks,

   After updating one of our servers to the latest stable image,
it appears that commit r246437 appears to be causing it to panic.

The commit:

http://svnweb.freebsd.org/base?view=revision&revision=246437

What one of our servers looks like:

http://people.freebsd.org/~jwd/zfsnfsserver.jpg

The last known working commit:

http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt

With commit r246437:

http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt

Note, most of the dmesg output is related to the ses devices. It
repeats itself multiple times before the panic.

ses39: ses0,pass20: Element descriptor: '            '
ses39: ses0,pass20: SAS Expander: 24 Physses39:  phy 0: connector 255 other 255
ses39:  phy 1: connector 255 other 255
ses39:  phy 2: connector 255 other 255
ses39:  phy 3: connector 255 other 255
ses39:  phy 4: connector 255 other 255
ses39:  phy 5: connector 255 other 255
ses39:  phy 6: connector 255 other 255

etc, etc...

After just a few minutes, the system panics. A pair of images
of the screen (sorry, no serial console at this time):

Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg

bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg

We are currently running a test to see if the fact that all our
shelves are dual-attached, allowing us to use geom multipath is
related. ie: we have disabled the 2nd HBA thus cutting the total
number of da & ses devices in half and thus not executing the
code in the commit that tracks duplicate ses devices.

Note, if we disable both HBA devices and boot the system up it
does not panic or print out the repeated messages, but of course
we have no disks :-)

I am unclear on the "connector 255 other 255" messages and have not
taken the time to look into them yet.

I would appreciate any insights folks can provide.

Many Thanks,
John

ps: We've had to seriously increase the console buffer size to
capture the complete dmesg output...

options   MSGBUF_SIZE=(32768*32)

Can we delay starting the kernel daemon until after the system
is up and /var/log/messages is available?  Just a thought...


More information about the freebsd-scsi mailing list