Sun Fire X2100 SATA problem [was - sun x2100 gmirror problem]

Miroslav Lachman 000.fbsd at quip.cz
Mon Jan 7 16:58:40 PST 2008


Miroslav Lachman wrote:

> andrej at antiszoc.hu wrote:
> 
>> Hi,
>>
>> We're using gmirror on our sun fire x2100 and FreeBSD 6.1-p10. Some days
>> ago I found this in the logs:
>>
>> Apr  1 02:12:05 x2100 kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error
>> (retrying request) LBA=612960533
>> Apr  1 02:12:05 x2100 kernel: ad6: FAILURE - WRITE_DMA48
>> status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=612960533
>> Apr  1 02:12:05 x2100 kernel: GEOM_MIRROR: Request failed (error=5).
>> ad6[WRITE(offset=313835792896, length=4096)]
>> Apr  1 02:12:05 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6
>> disconnected.
>>
>> Normally it looks like a disk error, but I think our half year old disks
>> (WD RE2) shouldn't fail after this short time. Of course they have moving
>> parts so they MAY fail. :( Yesterday I tried to reinit the sata channel
>> and insert the disk back into the mirror. I got this:
>>
>> Apr  3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6 
>> detected.
>> Apr  3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: rebuilding 
>> provider
>> ad6.
>> Apr  3 23:00:36 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error
>> (retrying request) LBA=245760
>> Apr  3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error
>> (retrying request) LBA=392576
>> Apr  3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error
>> (retrying request) LBA=392960
>> Apr  3 23:00:53 x2100 kernel: ad6: FAILURE - device detached
>>
>> After this, the disk disappeared from the sata channel completely.
>>
>> The wierd is that we used the onboard nvidia-raid and the very same error
>> occured, but there was no report in the kernel the machine just don't
>> asked for operating system. Later I found out that the disk was forgotten
>> ~2 weeks before that reboot (data was ~2 week old on it). Otherwise that
>> "forgotten/failed" disk was also half year old and was fine without a
>> problem.
>>
>> Is there anybody who experienced something similar with SUN X2100 or any
>> other servers running FreeBSD 6 and sata?
>>
>> Regards,
>> Andras
> 
> 
> Hi,
> 
> I can confirm your problem. I have same problem on one X2100 but not on 
> the others. Currenty I have 4 X2100 machines, but only one with this 
> strange problem. The problem is not caused by HDD it self, I tried to 
> replace it with brand new and same error appears after few days. May be 
> there are some problems with cables / connectors or something on mainboard.
> I am well known by problems with SATA(n) disk drives problems / 
> disappearing on this list and local (czech) mailing list. I had similar 
> problems on ASUS boards with Intel chipsets... so in my point of view - 
> there is something bad with SATA in general. I never had problem like 
> this with old good ATA drives.
> 
> I have not solution for this problem. Disk is OK after reboot for a few 
> dasy or weeks... if there is somebody which can help with investigating 
> this kind of problem, I'll be happy to cooperate.
> 
> output of dmesg, smartctl, gmirror etc.:
> http://www.quip.cz/1/freebsd/sata-hdd-problems/2007-03-07_errors_ad6.txt
> 
> Miroslav Lachman

Just for the record - mine problem was fixed by SATA cable replacement. 
Machine has uptime 227 days and no more disk errors.

Miroslav Lachman



More information about the freebsd-stable mailing list