twa kernel panic under heavy IO

Dan Rue drue at therub.org
Mon Oct 24 09:13:44 PDT 2005


On Thu, Oct 06, 2005 at 01:41:38PM -0700, Vinod Kashyap wrote:
> > -----Original Message-----
> > From: owner-freebsd-stable at freebsd.org 
> > [mailto:owner-freebsd-stable at freebsd.org] On Behalf Of Jung-uk Kim
> > Sent: Thursday, October 06, 2005 1:30 PM
> > To: freebsd-stable at FreeBSD.org
> > Cc: Dan Rue
> > Subject: Re: twa kernel panic under heavy IO
> > 
> > On Thursday 06 October 2005 04:07 pm, Dan Rue wrote:
> > > Greetings,
> > >
> > > I am running a 3ware 9500 SATA raid card in a 12x300GB raid 50 
> > > configuration.
> > >
> > > Here is dmesg identifying the controller:
> > > 3ware device driver for 9000 series storage controllers, version:
> > > 2.50.02.012 twa0: <3ware 9000 series Storage Controller> port 
> > > 0xb800-0xb8ff mem 0xfb800000-0xfbffffff,0xfc5ffc00-0xfc5ffcff irq
> > > 24 at device 2.0 on pci2 twa0: 12 ports, Firmware FE9X 2.06.00.009, 
> > > BIOS BE9X 2.03.01.051
> > >
> > > I was getting occasional kernel panics in 5.4 doing high I/O type 
> > > things (typically an rsync operation).  I was told that twa was 
> > > updated in 5-STABLE, so yesterday I upgraded.  I've 
> 
> Going by the dmesg, you have a 9.1.5.2 driver and 9.2 firmware.  The
> driver in 5 -STABLE is from the 9.2 release.  So, you might not have
> the driver upgrade done properly.  Try using the driver and firmware
> from the same release.  If you still see problems, please contact
> 3ware support.

Sorry about that, the driver and firmware were not actually mismatched -
I had pasted my dmesg from a previous email when I was running a
different version of FreeBSD.

---

After going around with 3ware web support, this issue has been
concluded, but not resolved.  I tried my 3ware 9500 on FreeBSD 5.3, 5.4,
and 5-STABLE.  With all of these versions of OS and driver (i never
changed the driver version manually), I received hard lock ups and
reboots (though, interestingly, no kernel panics).  

3ware had me check and troubleshoot a number of possibilities, until
they finally decided it was a hardware problem and issued me a
replacement card.  However, in the meantime, I upgraded to FreeBSD
6.0RC1 and the machine is now working flawlessly.  I returned the
replacement card unused.  

I can only conclude that this means that there is a large (timing?) bug
in the twa driver in freebsd 5.3/5.4/5-stable (as opposed to an isolated
hardware problem with my setup).

I have pasted the full conversation with 3ware on my website for those
interested here: 
http://therub.org/9500.txt (sorry for the poor formatting)

At one point, I received the following error message just before the
machine locked up:

>Oct 12 11:36:13 leopard kernel: initiate_write_filepage: already started

I grepped for that error message in the freebsd kernel source, and found
it in sys/ufs/ffs/ffs_softdep.c on line 3580.  What makes it really
interesting is the comment above where the error is thrown:

if (pagedep->pd_state & IOSTARTED) {
        /*
         * This can only happen if there is a driver that does not
         * understand chaining. Here biodone will reissue the call
         * to strategy for the incomplete buffers.
         */
        printf("initiate_write_filepage: already started\n");
        return;
}

I know this is a 3ware issue.  I am posting this resolution response
here in hopes that it may help someone else that hits this bug - and
with the hope that publically it will get the attention of the 3ware
freebsd driver team/individual.

Dan


More information about the freebsd-stable mailing list