Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness
scottl at samsco.org
Thu Dec 22 07:48:04 PST 2005
Ken Gunderson wrote:
> On Wed, 21 Dec 2005 21:30:51 -0700
> Scott Long <scottl at samsco.org> wrote:
>>Ken Gunderson wrote:
>>>On Tue, 20 Dec 2005 23:10:18 -0700
>>>Ken Gunderson <kgunders at teamcool.net> wrote:
>>>>On Tue, 20 Dec 2005 16:07:52 -0700
>>>>Ken Gunderson <kgunders at teamcool.net> wrote:
>>>>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that
>>>>>I'm hoping y'all may be able to shed some light on. I create logical
>>>>>drives and install FBSD just fine. Then cvsup, buildworld,
>>>>>buildkernel, installkernel. Upon reboot the system drives (mirror) are
>>>>>in degraded mode and the raid0 drive (swap) is offline. MegaRAID is
>>>>>unable to rebuild the arrays. I've called LSI support and they're
>>>>>mystified as well.
>>>>>E) Present Status:
>>>>>Interestingly enough, I am able to FORCE Physical Drive 1 back online
>>>>>and then "Check Consistency". Presently 21% complete so don't know if
>>>>>it will choke on error on not yet.
>>>>The consistency check did complete w/o any errors and rebooting all
>>>>logical drives are once again in "Optimal" state. For sake of
>>>>completeness heres the dmesg:
>>>Yet another follow up on my own post...
>>>1) Using the amr driver from 7-CURRENT yields same results.
>>>2) Did some testing playing musical hard drive slots. IF I do NOT
>>>use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid
>>>config for the mirror usings, e.g. slots 2 & 3, then all works as
>>>So it would seem that Tyan and/or LSI have something Foobarred? Or
>>>that for some reason FBSD is overwriting directly to disk on slot 1
>>>(i.e. da0) even though it's not technically there?
>>>Bizarre hardware issues. My raison d'etre...
>>There is no way for FreeBSD to directly access disks attached to the
>>RAID controller. All reads and writes to the array are bounded by the
>>controller, and there simply is no way to get around that. With a
>>certain amount of advanced hacking it would be possible to corrupt the
>>disks with the amr_cam module, but even that is disabled with 7-CURRENT.
>>What I'd actually suspect is that the backplane and/or slot connector is
>>bad, so bad that simple parity detection cannot catch it.
> Well, I told y'all it was BIZARRE ....
> The blackplane and/or connector issue was the conclusion last time
> around. So that machine was RMA'd by Tyan. The replacement was
> reportedly double checked by Tyan tech prior to being shipped. Now I'm
> seeing same with 2nd machine. And to make matters even more
> interesting... I've subsequently confirmed on yet a 3rd.
> I've done some additional testing w/7-CURRENT amr driver w/one of the
> mirrored hd's back in slot #1. If I just grab amr from cvs and
> build an SMP kernel I can boot into the new kernel just fine.
> If I then buildworld and reboot w/o proceeding any further then I get
> degraded arrary that I can't rebuild, e.g:
> $ dmesg |grep amr
> amr0: <LSILogic MegaRAID 1.53> mem 0xff4f0000-0xff4fffff irq 29 at
> device 4.0 on pci1 amr0: delete logical drives supported by controller
> amr0: <LSILogic MegaRAID SCSI 320-1> Firmware 1L37, BIOS G119, 64MB RAM
> amr0: delete logical drives supported by controller
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 66036MB (135241728 sectors) RAID 1 (degraded)
> amrd1: <LSILogic MegaRAID logical drive> on amr0
> amrd1: 8198MB (16789504 sectors) RAID 0 (offline)
> amrd2: <LSILogic MegaRAID logical drive> on amr0
> amrd2: 140270MB (287272960 sectors) RAID 5 (optimal)
> amrd1: I/O error - 0x1
> Trying to mount root from ufs:/dev/amrd0s1a
> So this would indicate there _might_ be something amis w/the amr driver
> that only pops up under a bit of I/O load, e.g. buildworld. But if this
> were the case then why would it only show up when using Slot 1?
The driver in 7-CURRENT was tested under extreme I/O load for 2 weeks
before being committed to the tree.
> Other possibility is that there is something just plain broken at the
> hardware/ firmware level with either the LSI card or the Tyan unit. I'd
> lean more towards the latter since the LSI 320-1 had been on the market
> for a long time now and widely deployed. Especially compared to the
> Tyan TA-26. So it seems like the odds alone would point more towards
> the Tyan.
> The good news is that LSI seems quite interested in further
> investigation (wish I could say the same for Tyan). Bad news is that
> their lab is undergoing remodeling. Or so I am told.
>>Some controllers allow you to run scans on individual disks from within
>>a controlled environment, like the BIOS. I don't recall if the LSI
>>cards have this feature, but if they do then they could almost certainly
> The 320-1 does not. Or at least not that I've found. Maybe there's
> some top secret proceedure somewhere I don't know about... I can only
> do consistency checks at logical drive level.
Would it be at all possible to substitute a different controller card,
even a plain SCSI one, and hook the backplane up to it?
More information about the freebsd-amd64