GEOM problems again...

Johan Ström johan at stromnet.org
Thu Jul 13 10:10:17 UTC 2006


On 10 jul 2006, at 13.59, Johan Ström wrote:

>
> On 10 jul 2006, at 11.09, Johan Ström wrote:
>
>>
>> On 21 maj 2006, at 11.16, Johan Ström wrote:
>>
>>> Hi
>>>
>>> I've had problems before with GEOM mirror and my SATA drives, and  
>>> i've posted about it here before too. The solution seemd to be a  
>>> change of motherboard, this reduced the crash very much (and also  
>>> the speeds archieved was greatly improved, from 10-15MB/s to  
>>> 40-50MB/s..).
>>> However after the change i had one or two crashes, but now it has  
>>> been running for well over 50-60 days or so without any problems.
>>> Then, 11 days ago I upgraded to 6.1... And now I got these  
>>> "crashe"s again (the mirror is crashed that is, the system still  
>>> runs fine):
>>>
>>> May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
>>> May 21 02:04:58 elfi kernel: subdisk6: detached
>>> May 21 02:04:58 elfi kernel: ad6: detached
>>> May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> ad6s1 disconnected.
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
>>> (offset=11006308352, length=2048)]error = 6
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
>>> (offset=164847927296, length=131072)]error = 6
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
>>> (offset=256680296448, length=32768)]error = 6
>>>
>>>
>>> Some info about the controller and disks:
>>>
>>> May  9 22:46:52 elfi kernel: ata1: <ATA channel 1> on atapci0
>>> May  9 22:46:52 elfi kernel: atapci1: <nVidia nForce2 Pro SATA150  
>>> controller> port  
>>> 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0 
>>> f,0x7c0
>>> 0-0x7c7f irq 22 at device 11.0 on pci0
>>>
>>> May  9 22:46:52 elfi kernel: ad4: 286188MB <Maxtor 7L300S0  
>>> BANC1G10> at ata2-master SATA150
>>> May  9 22:46:52 elfi kernel: ad6: 286188MB <Maxtor 7L300S0  
>>> BANC1G10> at ata3-master SATA150
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
>>> (id=4118114647).
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> ad4s1 detected.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> ad6s1 detected.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> ad6s1 activated.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> ad4s1 activated.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>>> mirror/gm0s1 launched.
>>> May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ 
>>> mirror/gm0s1a
>>>
>>> Anyone got any new clues? Afaik the disks should be working fine  
>>> (they are 6 months old and this same problem has occured multiple  
>>> times...)
>>>
>>> Hope to solve this ;)
>>>
>>> Thanks
>>> Johan
>>>
>>
>> Here we go again
>>
>> Jul  7 16:20:09 elfi kernel: ad4: FAILURE - device detached
>> Jul  7 16:20:09 elfi kernel: subdisk4: detached
>> Jul  7 16:20:09 elfi kernel: ad4: detached
>> Jul  7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
>> ad4s1 disconnected.
>> Jul  7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
>> (offset=88896847872, length=32768)]error = 6
>>
>> However no read read timeouts etc as before, just this. 18 days  
>> uptime this time (i've rebooted for other reasons since last  
>> mail). It always seems to be ad4 that is disconnecting.. I'm going  
>> to do some disk tests on it but i doubt it will give anything  
>> since i've had similiar problems from day one (did tests at that  
>> time w/o problems) with this gmirror setup (new disks).
>>
>> Johan
>
> Followup, I ran over the disk with Maxtors own test program, full  
> length test. Not a single problem.
> After reboot the raid is rebuilding fine:
>
> GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1.
>
> As usual it seems i cannot get the controller/driver to redetect  
> the disk using atacontrol etc..
>
> Johan

And now again... raid gone degraded only 2 days after reboot!

Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached
Jul 12 22:22:50 elfi kernel: subdisk4: detached
Jul 12 22:22:50 elfi kernel: ad4: detached
Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=120776474624, length=32768)]error = 6

$ uname -a
FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue  
May  9 20:40:23 CEST 2006 johan at elfi.stromnet.org:/usr/obj/usr/src/ 
sys/GENERIC  i386

Still no luck with atacontrol...

Is there any way to debug this further ?? I've tested the disk, the  
SATA cables are new... I've had similar problems with other  
motherboard...
I dont think this is related to hw problems, but rather a  
softwareproblem that needs to be solved, this is not something one  
can call stable ;)

So, any pointers how to enable more debugging or anything that could  
give some clues?

Johan




More information about the freebsd-stable mailing list