problems with AHCI on FreeBSD 8.2
oscarmpp at googlemail.com
Thu Feb 16 11:59:07 UTC 2012
Yesterday I did a backup of the sensible stuff of the pool and decided
to just break stuff on purpose ;)
I writed with dd over the sector marked as faulty by smartctl and
runned a smartctl short test. I repeated the process several times
until smartctl gave no errors at all on ada3.
After that i left the pool doing a scrub and it seemed to repair the
integrity of the pool:
[root at zaibach ~]# zpool status
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
scan: scrub repaired 398K in 10h39m with 0 errors on Thu Feb 16 09:15:59 2012
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada2p1 ONLINE 0 0 0
ada1p1 ONLINE 0 0 0
ada3p1 ONLINE 0 0 11
ada0p1 ONLINE 0 0 0
But funnily i got an ahci timeout on other drive, /dev/ada2.
Feb 16 04:08:23 zaibach kernel: ahcich2: Timeout on slot 15 port 0
Feb 16 04:08:23 zaibach kernel: ahcich2: is 00000000 cs 00040000 ss
00078000 rs 00078000 tfd c0 serr 00000000 cmd 0004d217
At least a short smartctl test on /dev/ada2 doesn't seem to complain this time.
On Thu, Feb 16, 2012 at 5:48 AM, John <john at theusgroup.com> wrote:
> Jeremy Chadwick wrote:
>> CRC errors ...
>>I have no real advice for tracking this kind of problem down. The most
>>common response is "replace cables", which isn't necessarily the root
>>cause. I have no advice or tips on how to track down interference
>>issues, or how to truly examine a disk PCB or controller PCB for the
>>latter item. "Flaky traces" on a PCB could cause this sort of thing.
>>Folks in the EE field would know more about these issues; I am not an EE
>>Since the attribute increased on both drives simultaneously (I have to
>>assume simultaneously?), it's more likely that the problem is not with
>>SATA cables or the drives but the controller on the motherboard. I'd
>>recommend replacing the motherboard. I make no guarantees this will fix
>>anything however, but it is the "common point" for both of your drives.
> This EE agrees with your advise. I would add if replacing the motherboard fails
> to fix the problem, then replace the power supply. Even with extremely high
> end test equipment, you likely would never be able to see the failure occur
> for at least two reasons; the most likely failure mode is inside a single IC,
> and adding probes would alter the environment enough to change the failure
> John Theus
> TheUs Group
> freebsd-stable at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
More information about the freebsd-stable