Another case of the vanishing disk

Sun Mar 16 08:52:28 UTC 2014

I moved the power cable to plug into a surge protector and not the
UPS. Still have the same problem. Every second I see new seek error
rate messages, some drivers report more at a time than others but all
4 are doing it.
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260695
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260696
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260697
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0

I will be stunned if it's yet another bad power supply, but I will
have to find another one somewhere and test this again. The drives are
all still under warranty.

On Sun, Mar 16, 2014 at 2:42 AM, cruxpot <cruxpot at gmail.com> wrote:
> It's an active PFC PSU plugged into an UPS which is not. Maybe that is
> the problem. I will try isolating some things tomorrow after the scrub
> has completed to see if I can get the errors to stop incrementing.
>
> On Sun, Mar 16, 2014 at 2:18 AM, Erich Dollansky
> <erichsfreebsdlist at alogt.com> wrote:
>> Hi,
>>
>> On Sun, 16 Mar 2014 02:00:51 -0500
>> cruxpot <cruxpot at gmail.com> wrote:
>>
>>> Seek_Error_Rate, Hardware_ECC_Recovered, Raw_Read_Error_Rate are all
>>> increasing steadily for all four disks. Does this have something to do
>>> with the recent resilver of the disk or the ongoing scrub (16.5%
>>> completed)?
>>>
>> the seek error rate could be linked to a failing power supply. The rest
>> should be just internal to the drive. Of course, also here a failing
>> power supply can be the cause.
>>
>> Can you put the drives into another machine?
>>
>> You must try to isolate the problem. It is a hardware problem on some
>> level. You must find out what it could be.
>>
>> Or just run a single disk on plain UFS. And connect it to some other
>> plug. And disconnect all other drives.
>>
>> Erich