zfs process hang on pool access
Andriy Gapon
avg at FreeBSD.org
Wed Jul 27 13:45:42 UTC 2011
on 27/07/2011 15:06 Steven Hartland said the following:
> I've checked the raw disk and all seems fine there, so does look like its
> some sort of zfs livelock.
>
> I'm trying to keep the machine available in case someone needs more information,
> but its a production machine so I'm going to have to reboot it in the next
> few hours.
>
> Disk tests:-
>
> dd if=/dev/da1 of=/dev/null bs=10m 5724+1 records in
> 5724+1 records out
> 60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec)
>
>
> smartctl -a /dev/da1
Is this the only disk associated with the troubled pool?
> smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family: SandForce Driven SSDs
> Device Model: Corsair CSSD-F60GB2
> Serial Number: 10446509320009990024
> Firmware Version: 1.1
> User Capacity: 60,022,480,896 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 8
> ATA Standard is: ATA-8-ACS revision 6
> Local Time is: Wed Jul 27 11:27:30 2011 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x00) Offline data collection activity
> was never started.
> Auto Offline Data Collection: Disabled.
> Self-test execution status: ( 0) The previous self-test routine completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline data collection: ( 0) seconds.
> Offline data collection
> capabilities: (0x7f) SMART execute Offline immediate.
> Auto Offline data collection on/off support.
> Abort Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 48) minutes.
> Conveyance self-test routine
> recommended polling time: ( 2) minutes.
> SCT capabilities: (0x003d) SCT Status supported.
> SCT Error Recovery Control supported.
> SCT Feature Control supported.
> SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 119 100 050 Pre-fail Always
> - 0/238293224
> 5 Retired_Block_Count 0x0033 097 097 003 Pre-fail Always
> - 256
> 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always
> - 5513h+00m+39.450s
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
> - 2
> 171 Program_Fail_Count 0x0000 000 000 000 Old_age Offline
> - 0
> 172 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline
> - 0
> 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline
> - 0
> 177 Wear_Range_Delta 0x0000 000 000 --- Old_age Offline
> - 1
> 181 Program_Fail_Count 0x0000 000 000 000 Old_age Offline
> - 0
> 182 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline
> - 0
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
> - 0
> 194 Temperature_Celsius 0x0022 022 026 000 Old_age Always
> - 22 (Min/Max 0/26)
> 195 ECC_Uncorr_Error_Count 0x001c 119 100 000 Old_age Offline
> - 0/238293224
> 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always
> - 0
> 231 SSD_Life_Left 0x0013 057 057 010 Pre-fail Always
> - 0
> 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline
> - 152704
> 234 SandForce_Internal 0x0000 000 000 000 Old_age Offline
> - 90688
> 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always
> - 90688
> 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always
> - 3584
>
> Error SMART Error Log Read failed: Input/output error
> Smartctl: SMART Error Log Read Failed
> Error SMART Error Self-Test Log Read failed: Input/output error
> Smartctl: SMART Self Test Log Read Failed
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
--
Andriy Gapon
More information about the freebsd-fs
mailing list