[Bug 223085] ZFS Resilver not completing - stuck at 99%
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Oct 18 10:48:08 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223085
Bug ID: 223085
Summary: ZFS Resilver not completing - stuck at 99%
Product: Base System
Version: 10.2-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: paul at vsl-net.com
I have a number of FreeBSD system with large (30TB) ZFS pools.
I have had several disks fail over time and have seen problems with resilvers
either not completing or getting to 99% within a week but then taking a further
month to complete.
I have been seeking advice in the forums.
https://forums.freebsd.org/threads/61643/#post-355088
A system that has a disk replaced some time ago is in this state
pool: s11d34
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Sep 14 15:08:15 2017
49.4T scanned out of 49.8T at 17.7M/s, 6h13m to go
4.93T resilvered, 99.24% done
config:
NAME STATE READ WRITE CKSUM
s11d34 DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
multipath/J11F18-1EJB8KUJ ONLINE 0 0 0
multipath/J11R01-1EJ2XT4F ONLINE 0 0 0
multipath/J11R02-1EHZE2GF ONLINE 0 0 0
multipath/J11R03-1EJ2XTMF ONLINE 0 0 0
multipath/J11R04-1EJ3NK4J ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
multipath/J11R05-1EJ2Z8AF ONLINE 0 0 0
multipath/J11R06-1EJ2Z8NF ONLINE 0 0 0
replacing-2 OFFLINE 0 0 0
7444569586532474759 OFFLINE 0 0 0 was
/dev/multipath/J11R07-1EJ03GXJ
multipath/J11F23-1EJ3AJBJ ONLINE 0 0 0
(resilvering)
multipath/J11R08-1EJ3A0HJ ONLINE 0 0 0
multipath/J11R09-1EJ32UPJ ONLINE 0 0 0
It got to 99.24% within a week but has stuck there since.
I have stopped ALL access to the pool and ran zpool iostat and there is still
activity (although low e.g. 1.2M read, 1.78M write etc...) so it does appear to
be doing something.
The disks (6TB or 8TB HGST SAS) are attached via an LSI 9207-8e HBA which is
connected to a LSI 6160 SAS Switch that is connected to a Supermicro JBOD.
The HBA's have 2 connectors, each is connected to a different SAS switch.
The system sees the disk twice as expected and I use gmultipath to label the
disks and set in Active/Passive mode, I then use the multipath name during
zpool create e.g.
root at freebsd04:~ # gmultipath status
Name Status Components
multipath/J11R00-1EJ2XR5F OPTIMAL da0 (ACTIVE)
da11 (PASSIVE)
multipath/J11R01-1EJ2XT4F OPTIMAL da1 (ACTIVE)
da12 (PASSIVE)
multipath/J11R02-1EHZE2GF OPTIMAL da2 (ACTIVE)
da13 (PASSIVE)
zpool create -f store43 raidz2 multipath/J11R00-1EJ2XR5F
multipath/J11R01-1EJ2XT4F etc.......
Any advice if this is a bug or something wrong with my setup?
Thanks
Paul
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list