Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness

Scott Long scottl at samsco.org
Thu Dec 22 07:48:04 PST 2005


Ken Gunderson wrote:
> On Wed, 21 Dec 2005 21:30:51 -0700
> Scott Long <scottl at samsco.org> wrote:
> 
> 
>>Ken Gunderson wrote:
>>
>>
>>>On Tue, 20 Dec 2005 23:10:18 -0700
>>>Ken Gunderson <kgunders at teamcool.net> wrote:
>>>
>>>
>>>
>>>>On Tue, 20 Dec 2005 16:07:52 -0700
>>>>Ken Gunderson <kgunders at teamcool.net> wrote:
>>>>
>>>>
>>>>
>>>>>Hello List:
>>>>>
>>>>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that
>>>>>I'm hoping y'all may be able to shed some light on.  I create logical
>>>>>drives and install FBSD just fine.  Then cvsup, buildworld,
>>>>>buildkernel, installkernel.  Upon reboot the system drives (mirror) are
>>>>>in degraded mode and the raid0 drive (swap) is offline.  MegaRAID is
>>>>>unable to rebuild the arrays.  I've called LSI support and they're
>>>>>mystified as well.
>>>>
>>>>[big snippage]
>>>>
>>>>
>>>>
>>>>>E) Present Status:
>>>>>
>>>>>Interestingly enough, I am able to FORCE Physical Drive 1 back online
>>>>>and then "Check Consistency".  Presently 21% complete so don't know if
>>>>>it will choke on error on not yet.
>>>>
>>>>Update- 
>>>>
>>>>The consistency check did complete w/o any errors and rebooting all
>>>>logical drives are once again in "Optimal" state.  For sake of
>>>>completeness heres the dmesg:
>>>
>>>
>>>[more snippage]
>>>
>>>Yet another follow up on my own post...
>>>
>>>Update Redoux:
>>>
>>>1) Using the amr driver from 7-CURRENT yields same results.  
>>>
>>>2) Did some testing playing musical hard drive slots.  IF I do NOT
>>>use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid
>>>config for the mirror usings, e.g. slots 2 & 3, then all works as
>>>normally expected.
>>>
>>>So it would seem that Tyan and/or LSI have something Foobarred?  Or
>>>that for some reason FBSD is overwriting directly to disk on slot 1
>>>(i.e. da0) even though it's not technically there?
>>>
>>>Bizarre hardware issues.  My raison d'etre...
>>>
>>
>>There is no way for FreeBSD to directly access disks attached to the
>>RAID controller.  All reads and writes to the array are bounded by the
>>controller, and there simply is no way to get around that.  With a
>>certain amount of advanced hacking it would be possible to corrupt the
>>disks with the amr_cam module, but even that is disabled with 7-CURRENT.
>>What I'd actually suspect is that the backplane and/or slot connector is
>>bad, so bad that simple parity detection cannot catch it.
> 
> 
> Well, I told y'all it was BIZARRE ....  
> 
> The blackplane and/or connector issue was the conclusion last time
> around.  So that machine was RMA'd by Tyan.  The replacement was
> reportedly double checked by Tyan tech prior to being shipped.  Now I'm
> seeing same with 2nd machine.  And to make matters even more
> interesting... I've subsequently confirmed on yet a 3rd.  
> 
> I've done some additional testing w/7-CURRENT amr driver w/one of the
> mirrored hd's back in slot #1.  If I just grab amr from cvs and
> build an SMP kernel I can boot into the new kernel just fine.
> 
> If I then buildworld and reboot w/o proceeding any further then I get 
> degraded arrary that I can't rebuild, e.g:
> 
> $ dmesg |grep amr
> amr0: <LSILogic MegaRAID 1.53> mem 0xff4f0000-0xff4fffff irq 29 at
> device 4.0 on pci1 amr0: delete logical drives supported by controller
> amr0: <LSILogic MegaRAID SCSI 320-1> Firmware 1L37, BIOS G119, 64MB RAM
> amr0: delete logical drives supported by controller
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 66036MB (135241728 sectors) RAID 1 (degraded)
> amrd1: <LSILogic MegaRAID logical drive> on amr0
> amrd1: 8198MB (16789504 sectors) RAID 0 (offline)
> amrd2: <LSILogic MegaRAID logical drive> on amr0
> amrd2: 140270MB (287272960 sectors) RAID 5 (optimal)
> amrd1: I/O error - 0x1
> Trying to mount root from ufs:/dev/amrd0s1a
> 
> So this would indicate there _might_ be something amis w/the amr driver
> that only pops up under a bit of I/O load, e.g. buildworld. But if this
> were the case then why would it only show up when using Slot 1?
> 

The driver in 7-CURRENT was tested under extreme I/O load for 2 weeks
before being committed to the tree.

> Other possibility is that there is something just plain broken at the
> hardware/ firmware level with either the LSI card or the Tyan unit.  I'd
> lean more towards the latter since the LSI 320-1 had been on the market
> for a long time now and widely deployed. Especially compared to the
> Tyan TA-26.  So it seems like the odds alone would point more towards
> the Tyan.
> 
> The good news is that LSI seems quite interested in further
> investigation (wish I could say the same for Tyan).  Bad news is that
> their lab is undergoing remodeling.  Or so I am told.
> 
> 
>>Some controllers allow you to run scans on individual disks from within
>>a controlled environment, like the BIOS.  I don't recall if the LSI
>>cards have this feature, but if they do then they could almost certainly
>>verify this.
> 
> 
> The 320-1 does not.  Or at least not that I've found.  Maybe there's
> some top secret proceedure somewhere I don't know about...  I can only
> do consistency checks at logical drive level.
> 


Would it be at all possible to substitute a different controller card,
even a plain SCSI one, and hook the backplane up to it?

Scott


More information about the freebsd-amd64 mailing list