7.2-RELEASE-p4, IO errors & RAID1 failure

Alexander Motin mav at FreeBSD.org
Fri Jun 18 14:16:22 UTC 2010


Jeremy Chadwick wrote:
> On Fri, Jun 18, 2010 at 01:36:53PM +0200, Miroslav Lachman wrote:
>> Jeremy Chadwick wrote:
>>> On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote:
>> [...]
>>
>>>> The drives in the RAID exist on two seperate ATA channels:
>>>> [root at meshuga /home/matt]# atacontrol list
>>>> ATA channel 0:
>>>>     Master:  ad0<WDC WD3200AAKS-00VYA0/12.01B02>  SATA revision 2.x
>>>>     Slave:   ad1<FB160C4081/HPF0>  SATA revision 1.x
>>>> ATA channel 1:
>>>>     Master:  ad2<WDC WD3200AAKS-00VYA0/12.01B02>  SATA revision 2.x
>>>>     Slave:       no device present
>>>> ATA channel 2:
>>>>     Master: acd0<HL-DT-ST DVDRAM GH22NS40/NL01>  SATA revision 1.x
>>>>     Slave:       no device present
>>>> ATA channel 3:
>>>>     Master:      no device present
>>>>     Slave:       no device present
>>>>
>>>> ad1 is a third 160G drive that I periodically back up to using cron.
>>> So your RAID-1 array consists of ad0 and ad2?  You didn't provide
>>> "atacontrol status" output so I'm going to assume that's the case.
>>>
>>> What's odd to me is that you somehow have two disks on a single ATA
>>> channel -- look closely at channel 0.  SATA has a 1:1 device-to-channel
>>> mapping, so I'm a little surprised to see there's two devices on channel
>>> 0.  To me, this indicates your system BIOS is configured to run in
>>> "Emulation" mode -- where the ATA controller pretends to be a PATA/IDE
>>> controller, thus SATA-0 and SATA-1 devices appear as primary master and
>>> primary slave, respectively.
>>>
>>> What motherboard is this?  Can you change the setting to either
>>> "Native", "Enhanced", or (even better) "AHCI"?  I've seen some systems
>>> where the Serial ATA option in the BIOS has an "Auto" option, which does
>>> totally bizarre things at times.
>>>
>>> But before changing the setting, I would recommend dealing with the disk
>>> problem first.  Changing the SATA controller operation mode will almost
>>> certainly change all of your device names (you'll have to go into
>>> single-user mode, mount filesystems by hand, fix /etc/fstab, etc.).
>> [...]
>>
>> It is "normal" on HP G5 series. I have ProLiant ML 110 G5. I tried
>> all type of settings in BIOS, but all of them shows two disks on one
>> ATA channel:
>>
>> HP ProLiant ML 110 G5
>>
>> FreeBSD 7.2-RELEASE-p4 amd64 GENERIC
>>
>> root at kiwi ~/# atacontrol list
>> ATA channel 0:
>>     Master:  ad0 <SAMSUNG HD103UJ/1AA01113> SATA revision 2.x
>>     Slave:   ad1 <SAMSUNG HD103UJ/1AA01113> SATA revision 2.x
>> ATA channel 1:
>>     Master:  ad2 <SAMSUNG HD103UJ/1AA01113> SATA revision 2.x
>>     Slave:   ad3 <SAMSUNG HD103UJ/1AA01113> SATA revision 2.x
>> ATA channel 2:
>>     Master: acd0 <HL-DT-ST DVD-RAM GH15L/FA01> SATA revision 1.x
>>     Slave:       no device present
>> ATA channel 3:
>>     Master:      no device present
>>     Slave:       no device present
>>
>>
>>
>> atapci0: <Intel ICH9 SATA300 controller> port
>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1c10-0x1c1f,0x1c00-0x1c0f at
>> device 31.2 on pci0
>> ata0: <ATA channel 0> on atapci0
>> ata0: [ITHREAD]
>> ata1: <ATA channel 1> on atapci0
>> ata1: [ITHREAD]
>> pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
>> atapci1: <Intel ICH9 SATA300 controller> port 0x1c68-0x1c6f,0x1c5c-0x1c5f,0x1c60-0x1c67,0x1c58-0x1c5b,0x1c30-0x1c3f,0x1c20-0x1c2f
>> irq 18 at device 31.5 on pci0
>> atapci1: [ITHREAD]
>> ata2: <ATA channel 0> on atapci1
>> ata2: [ITHREAD]
>> ata3: <ATA channel 1> on atapci1
>> ata3: [ITHREAD]
>>
>>
>> pciconf -lv
>> atapci0 at pci0:0:31:2:    class=0x01018a card=0x31f4103c
>> chip=0x29208086 rev=0x02 hdr=0x00
>>     vendor     = 'Intel Corporation'
>>     device     = '82801IB/IR/IH (ICH9 Family) 4 port Serial ATA
>> Storage Controller 1'
>>     class      = mass storage
>>     subclass   = ATA
>>
>> atapci1 at pci0:0:31:5:    class=0x010185 card=0x31f4103c
>> chip=0x29268086 rev=0x02 hdr=0x00
>>     vendor     = 'Intel Corporation'
>>     device     = '82801IB/IR/IH (ICH9 Family) 2 port Serial ATA
>> Storage Controller 2'
>>     class      = mass storage
>>     subclass   = ATA
>>
>>
>>
>> ad0: 953869MB <SAMSUNG HD103UJ 1AA01113> at ata0-master SATA300
>> ad1: 953869MB <SAMSUNG HD103UJ 1AA01113> at ata0-slave SATA300
>> ad2: 953869MB <SAMSUNG HD103UJ 1AA01113> at ata1-master SATA300
>> ad3: 953869MB <SAMSUNG HD103UJ 1AA01113> at ata1-slave SATA300
>> da0 at umass-sim0 bus 0 target 0 lun 0
>> da0: <USB 2.0 USB Flash Drive 0.00> Removable Direct Access SCSI-2 device
>> da0: 40.000MB/s transfers
>> da0: 1928MB (3948544 512 byte sectors: 255H 63S/T 245C)
>> acd0: DVDR <HL-DT-ST DVD-RAM GH15L/FA01> at ata2-master SATA150
>>
>>
>> I am using this machine as storage for backups with ZFS RAIDZ
>> without any timeouts so I think that two disks on one channel is not
>> causing the timeouts (only little slowdown)
> 
> Wow, that's really... interesting.  :-)  What this indicates is that the
> controller is running in Native/Enhanced mode yet devices attached to
> SATA ports #0/#1 are master/slave on channel 0, and ports #2/#3 are
> master/slave on channel 1.

Except AHCI, all other modes are just variations of PATA emulation.
"subclass   = ATA" means that there is no AHCI enabled. PATA emulation
itself should not be a problem, but it is definitely not good from
performance and hot-swap points.

As already told, ata(4) has very strict timeout values. It may happen,
that due to medium errors drive needs too much time co complete I/O. It
is theoretically possible that SMART may complete the test due to higher
timeout values. The better test would be to run MHDD tool on disk to
find/remap pre-failure sectors, if any.

-- 
Alexander Motin


More information about the freebsd-stable mailing list