twa kernel panic under heavy IO
Vinod Kashyap
vkashyap at amcc.com
Mon Oct 24 11:07:32 PDT 2005
> -----Original Message-----
> From: Dan Rue [mailto:drue at therub.org]
> Sent: Monday, October 24, 2005 9:14 AM
> To: Vinod Kashyap
> Cc: freebsd-stable at FreeBSD.org
> Subject: Re: twa kernel panic under heavy IO
>
> On Thu, Oct 06, 2005 at 01:41:38PM -0700, Vinod Kashyap wrote:
> > > -----Original Message-----
> > > From: owner-freebsd-stable at freebsd.org
> > > [mailto:owner-freebsd-stable at freebsd.org] On Behalf Of Jung-uk Kim
> > > Sent: Thursday, October 06, 2005 1:30 PM
> > > To: freebsd-stable at FreeBSD.org
> > > Cc: Dan Rue
> > > Subject: Re: twa kernel panic under heavy IO
> > >
> > > On Thursday 06 October 2005 04:07 pm, Dan Rue wrote:
> > > > Greetings,
> > > >
> > > > I am running a 3ware 9500 SATA raid card in a 12x300GB raid 50
> > > > configuration.
> > > >
> > > > Here is dmesg identifying the controller:
> > > > 3ware device driver for 9000 series storage
> controllers, version:
> > > > 2.50.02.012 twa0: <3ware 9000 series Storage Controller> port
> > > > 0xb800-0xb8ff mem
> 0xfb800000-0xfbffffff,0xfc5ffc00-0xfc5ffcff irq
> > > > 24 at device 2.0 on pci2 twa0: 12 ports, Firmware FE9X
> > > > 2.06.00.009, BIOS BE9X 2.03.01.051
> > > >
> > > > I was getting occasional kernel panics in 5.4 doing
> high I/O type
> > > > things (typically an rsync operation). I was told that twa was
> > > > updated in 5-STABLE, so yesterday I upgraded. I've
> >
> > Going by the dmesg, you have a 9.1.5.2 driver and 9.2
> firmware. The
> > driver in 5 -STABLE is from the 9.2 release. So, you might
> not have
> > the driver upgrade done properly. Try using the driver and
> firmware
> > from the same release. If you still see problems, please contact
> > 3ware support.
>
> Sorry about that, the driver and firmware were not actually
> mismatched - I had pasted my dmesg from a previous email when
> I was running a different version of FreeBSD.
>
> ---
>
> After going around with 3ware web support, this issue has
> been concluded, but not resolved. I tried my 3ware 9500 on
> FreeBSD 5.3, 5.4, and 5-STABLE. With all of these versions
> of OS and driver (i never changed the driver version
> manually), I received hard lock ups and reboots (though,
> interestingly, no kernel panics).
>
> 3ware had me check and troubleshoot a number of
> possibilities, until they finally decided it was a hardware
> problem and issued me a replacement card. However, in the
> meantime, I upgraded to FreeBSD
> 6.0RC1 and the machine is now working flawlessly. I returned
> the replacement card unused.
>
> I can only conclude that this means that there is a large
> (timing?) bug in the twa driver in freebsd 5.3/5.4/5-stable
> (as opposed to an isolated hardware problem with my setup).
>
> I have pasted the full conversation with 3ware on my website
> for those interested here:
> http://therub.org/9500.txt (sorry for the poor formatting)
>
> At one point, I received the following error message just
> before the machine locked up:
>
> >Oct 12 11:36:13 leopard kernel: initiate_write_filepage: already
> >started
>
> I grepped for that error message in the freebsd kernel
> source, and found it in sys/ufs/ffs/ffs_softdep.c on line
> 3580. What makes it really interesting is the comment above
> where the error is thrown:
>
> if (pagedep->pd_state & IOSTARTED) {
> /*
> * This can only happen if there is a driver that does not
> * understand chaining. Here biodone will reissue the call
> * to strategy for the incomplete buffers.
> */
> printf("initiate_write_filepage: already started\n");
> return;
> }
>
> I know this is a 3ware issue. I am posting this resolution
> response here in hopes that it may help someone else that
> hits this bug - and with the hope that publically it will get
> the attention of the 3ware freebsd driver team/individual.
>
The error messages you are seeing are consistent with bad hardware.
The hardware is becoming unavailable for the driver to talk to it.
This other message "initiate_write_filepage..." is different but did
you see the machine hang after this message got printed? I don't
think it's related to the hang.
> Dan
>
--------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
More information about the freebsd-stable
mailing list