Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness

Ken Gunderson kgunders at teamcool.net
Thu Dec 22 01:06:41 PST 2005


On Wed, 21 Dec 2005 21:30:51 -0700
Scott Long <scottl at samsco.org> wrote:

> Ken Gunderson wrote:
> 
> > On Tue, 20 Dec 2005 23:10:18 -0700
> > Ken Gunderson <kgunders at teamcool.net> wrote:
> > 
> > 
> >>On Tue, 20 Dec 2005 16:07:52 -0700
> >>Ken Gunderson <kgunders at teamcool.net> wrote:
> >>
> >>
> >>>Hello List:
> >>>
> >>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that
> >>>I'm hoping y'all may be able to shed some light on.  I create logical
> >>>drives and install FBSD just fine.  Then cvsup, buildworld,
> >>>buildkernel, installkernel.  Upon reboot the system drives (mirror) are
> >>>in degraded mode and the raid0 drive (swap) is offline.  MegaRAID is
> >>>unable to rebuild the arrays.  I've called LSI support and they're
> >>>mystified as well.
> >>
> >>[big snippage]
> >>
> >>
> >>>E) Present Status:
> >>>
> >>>Interestingly enough, I am able to FORCE Physical Drive 1 back online
> >>>and then "Check Consistency".  Presently 21% complete so don't know if
> >>>it will choke on error on not yet.
> >>
> >>Update- 
> >>
> >>The consistency check did complete w/o any errors and rebooting all
> >>logical drives are once again in "Optimal" state.  For sake of
> >>completeness heres the dmesg:
> > 
> > 
> > [more snippage]
> > 
> > Yet another follow up on my own post...
> > 
> > Update Redoux:
> > 
> > 1) Using the amr driver from 7-CURRENT yields same results.  
> > 
> > 2) Did some testing playing musical hard drive slots.  IF I do NOT
> > use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid
> > config for the mirror usings, e.g. slots 2 & 3, then all works as
> > normally expected.
> > 
> > So it would seem that Tyan and/or LSI have something Foobarred?  Or
> > that for some reason FBSD is overwriting directly to disk on slot 1
> > (i.e. da0) even though it's not technically there?
> > 
> > Bizarre hardware issues.  My raison d'etre...
> > 
> 
> There is no way for FreeBSD to directly access disks attached to the
> RAID controller.  All reads and writes to the array are bounded by the
> controller, and there simply is no way to get around that.  With a
> certain amount of advanced hacking it would be possible to corrupt the
> disks with the amr_cam module, but even that is disabled with 7-CURRENT.
> What I'd actually suspect is that the backplane and/or slot connector is
> bad, so bad that simple parity detection cannot catch it.

Well, I told y'all it was BIZARRE ....  

The blackplane and/or connector issue was the conclusion last time
around.  So that machine was RMA'd by Tyan.  The replacement was
reportedly double checked by Tyan tech prior to being shipped.  Now I'm
seeing same with 2nd machine.  And to make matters even more
interesting... I've subsequently confirmed on yet a 3rd.  

I've done some additional testing w/7-CURRENT amr driver w/one of the
mirrored hd's back in slot #1.  If I just grab amr from cvs and
build an SMP kernel I can boot into the new kernel just fine.

If I then buildworld and reboot w/o proceeding any further then I get 
degraded arrary that I can't rebuild, e.g:

$ dmesg |grep amr
amr0: <LSILogic MegaRAID 1.53> mem 0xff4f0000-0xff4fffff irq 29 at
device 4.0 on pci1 amr0: delete logical drives supported by controller
amr0: <LSILogic MegaRAID SCSI 320-1> Firmware 1L37, BIOS G119, 64MB RAM
amr0: delete logical drives supported by controller
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 66036MB (135241728 sectors) RAID 1 (degraded)
amrd1: <LSILogic MegaRAID logical drive> on amr0
amrd1: 8198MB (16789504 sectors) RAID 0 (offline)
amrd2: <LSILogic MegaRAID logical drive> on amr0
amrd2: 140270MB (287272960 sectors) RAID 5 (optimal)
amrd1: I/O error - 0x1
Trying to mount root from ufs:/dev/amrd0s1a

So this would indicate there _might_ be something amis w/the amr driver
that only pops up under a bit of I/O load, e.g. buildworld. But if this
were the case then why would it only show up when using Slot 1?

Other possibility is that there is something just plain broken at the
hardware/ firmware level with either the LSI card or the Tyan unit.  I'd
lean more towards the latter since the LSI 320-1 had been on the market
for a long time now and widely deployed. Especially compared to the
Tyan TA-26.  So it seems like the odds alone would point more towards
the Tyan.

The good news is that LSI seems quite interested in further
investigation (wish I could say the same for Tyan).  Bad news is that
their lab is undergoing remodeling.  Or so I am told.

> Some controllers allow you to run scans on individual disks from within
> a controlled environment, like the BIOS.  I don't recall if the LSI
> cards have this feature, but if they do then they could almost certainly
> verify this.

The 320-1 does not.  Or at least not that I've found.  Maybe there's
some top secret proceedure somewhere I don't know about...  I can only
do consistency checks at logical drive level.

-- 
Best regards,

Ken Gunderson

Q: Because it reverses the logical flow of conversation.
A: Why is putting a reply at the top of the message frowned upon?



More information about the freebsd-amd64 mailing list