error destroying zfs filesystem

Alexandr Krivulya shuriku at shurik.kiev.ua
Sat Feb 16 09:55:13 UTC 2013


15.02.2013 14:57, Peter Maloney пишет:
> On 2013-02-15 13:44, Alexandr Kovalenko wrote:
>> On Fri, Feb 15, 2013 at 11:30 AM, Alexandr Krivulya
>> <shuriku at shurik.kiev.ua> wrote:
>>> Hello everyone!
>>>
>>> After upgrading my zfs-only system from 8.2 to 9.1 I have many errors
>>> related to zfs in my /var/log/messages:
>>>
>>> Feb 15 13:12:44 gw kernel: metaslab_free_dva(): bad DVA
>>> 0:264842321920Solaris: WARNING: metaslab_free_dva(): bad DVA 0:338480095232
>>> Feb 15 13:12:44 gw kernel: Solaris: WARNING: metaslab_free_dva(): bad
>>> DVA 0:277633901056Solaris: WARNING:
>>> Feb 15 13:12:45 gw kernel: metaslab_free_dva(): bad DVA
>>> 0:277263710208Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:277633606144Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:278349642240Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:278429099008Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:278349926400Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:278245378560Solaris: WARNING: metaslab_free_dva(): bad DVA
>>> 0:256838777344Solaris: WARNING: metaslab_free_dva(): bad DVA 0:327364684800
>>> Feb 15 13:12:45 gw kernel: Solaris: WARNING: metaslab_free_dva(): bad
>>> DVA 0:312373604864
>>>
>>> root at gw:/ # zpool status -v
>>>   pool: zmirror
>>>  state: ONLINE
>>> status: One or more devices has experienced an error resulting in data
>>>         corruption.  Applications may be affected.
>>> action: Restore the file in question if possible.  Otherwise restore the
>>>         entire pool from backup.
>>>    see: http://illumos.org/msg/ZFS-8000-8A
>>>   scan: scrub repaired 0 in 1h39m with 1 errors on Thu Feb 14 17:48:53 2013
>>> config:
>>>
>>>         NAME            STATE     READ WRITE CKSUM
>>>         zmirror         ONLINE       0     0     2
>>>           mirror-0      ONLINE       0     0     8
>>>             gpt/disk01  ONLINE       0     0     8
>>>             gpt/disk02  ONLINE       0     0     8
>>>
>>> errors: Permanent errors have been detected in the following files:
>>>
>>>         zmirror/usr:<0x0>
>>>         <0xc8>:<0x0>
>> [dd]
>>> How can I solve this issue?
>> Make smartctl -t long /dev/<your_physical_drive_here> and then take a
>> look if there any pending sectors/errors in output of smartctl -a
>> /dev/<your_physical_drive_here> ? (for both of drives used)
All tests seems to be fine:

root at gw:/usr/home/support # smartctl -l selftest /dev/ada0
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1446 -

root at gw:/usr/home/support # smartctl -l selftest /dev/ada1
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1630

smartctl also didn't show any problems, see attached file
> You could also try going in /usr and "rm" or "truncate" some files
> until the "Permanent errors have been detected" list is empty. And
> this assumes you already ran a full scrub, which you must do to remove
> the files. 

Now I cannot mount this filesystem to remove files:

root at gw:/usr/home/support # zfs mount zmirror/usr
cannot mount 'zmirror/usr': mountpoint or dataset is busy

The only way I see is to backup entire pool, destroy and recreate it,
and restore from a backup.
-------------- next part --------------
root at gw:/usr/home/support # smartctl -iAH /dev/ada0
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4 Serial ATA
Device Model:     WDC WD5003ABYX-01WERA1
Serial Number:    WD-WMAYP3251340
LU WWN Device Id: 5 0014ee 0032ad53b
Firmware Version: 01.01S02
User Capacity:    500 107 862 016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 16 11:49:32 2013 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   139   139   021    Pre-fail  Always       -       4033
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       18
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1465
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       16
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       15
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   116   095   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0





root at gw:/usr/home/support # smartctl -iAH /dev/ada1
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4 Serial ATA
Device Model:     WDC WD5003ABYX-01WERA1
Serial Number:    WD-WMAYP3265645
LU WWN Device Id: 5 0014ee 0032c2b14
Firmware Version: 01.01S02
User Capacity:    500 107 862 016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 16 11:48:40 2013 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   142   142   021    Pre-fail  Always       -       3875
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       15
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1649
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       13
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   116   095   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
 


More information about the freebsd-fs mailing list