ZFS weird device tasting loop since MFC
Kip Macy
kmacy at freebsd.org
Fri Jun 5 11:28:40 UTC 2009
Must be a weird geom interaction. I don't see this with raw disk. I'll
look at it eventually but UMA and performance are further up in the
queue.
-Kip
On Fri, Jun 5, 2009 at 1:44 AM, Ulrich Spörlein<uqs at spoerlein.net> wrote:
> On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Spörlein wrote:
>> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Spörlein wrote:
>> > Hi all,
>> >
>> > so I went ahead and updated my ~7.2 file server to the new ZFS goodness,
>> > and before running any further tests, I already discovered something
>> > weird and annoying.
>> >
>> > I'm using a mirror on GELI, where one disk is usually *not* attached as
>> > a means of poor man's backup. (I had to go that route, as send/recv of
>> > snapshots frequently deadlocked the system, whereas a mirror scrubbing
>> > did not)
>> >
>> > root at coyote:~# zpool status
>> > pool: tank
>> > state: DEGRADED
>> > status: The pool is formatted using an older on-disk format. The pool can
>> > still be used, but some features are unavailable.
>> > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
>> > pool will no longer be accessible on older software versions.
>> > scrub: none requested
>> > config:
>> >
>> > NAME STATE READ WRITE CKSUM
>> > tank DEGRADED 0 0 0
>> > mirror DEGRADED 0 0 0
>> > ad4.eli ONLINE 0 0 0
>> > 12333765091756463941 REMOVED 0 0 0 was /dev/da0.eli
>> >
>> > errors: No known data errors
>> >
>> > When imported, there is a constant "tasting" of all devices in the system,
>> > which also makes the floppy drive go spinning constantly, which is really
>> > annoying. It did not do this with the old ZFS, are there any remedies?
>> >
>> > gstat(8) is displaying the following every other second, together with a
>> > spinning fd0 drive.
>> >
>> > dT: 1.010s w: 1.000s filter: ^...$
>> > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
>> > 0 0 0 0 0.0 0 0 0.0 0.0| fd0
>> > 0 8 8 1014 0.1 0 0 0.0 0.1| md0
>> > 0 32 32 4055 9.2 0 0 0.0 29.2| ad0
>> > 0 77 10 1267 7.1 63 1125 2.3 31.8| ad4
>> >
>> > There is no activity going on, especially md0 is for /tmp, yet it
>> > constantly tries to read stuff from everywhere. I will now insert the
>> > second drive and see if ZFS shuts up then ...
>>
>> It does, but it also did not start resilvering the second disk:
>>
>> root at coyote:~# zpool status
>> pool: tank
>> state: ONLINE
>> status: One or more devices has experienced an unrecoverable error. An
>> attempt was made to correct the error. Applications are unaffected.
>> action: Determine if the device needs to be replaced, and clear the errors
>> using 'zpool clear' or replace the device with 'zpool replace'.
>> see: http://www.sun.com/msg/ZFS-8000-9P
>> scrub: none requested
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> tank ONLINE 0 0 0
>> mirror ONLINE 0 0 0
>> ad4.eli ONLINE 0 0 0
>> da0.eli ONLINE 0 0 16
>>
>> errors: No known data errors
>>
>> Will now run the scrub and report back in 6-9h.
>
> Another datapoint: While the floppy-tasting has stopped, since the mirror sees
> all devices again, there is some other problem here:
>
> root at coyote:/# zpool online tank da0.eli
> root at coyote:/# zpool status
> pool: tank
> state: ONLINE
> scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> mirror ONLINE 0 0 0
> ad4.eli ONLINE 0 0 0 684K resilvered
> da0.eli ONLINE 0 0 0 2.20M resilvered
>
> errors: No known data errors
> root at coyote:/# zpool offline tank da0.eli
> root at coyote:/# zpool status
> pool: tank
> state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> action: Online the device using 'zpool online' or replace the device with
> 'zpool replace'.
> scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> tank DEGRADED 0 0 0
> mirror DEGRADED 0 0 0
> ad4.eli ONLINE 0 0 0 684K resilvered
> da0.eli OFFLINE 0 0 0 2.20M resilvered
>
> errors: No known data errors
> root at coyote:/# zpool status
> pool: tank
> state: DEGRADED
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> tank DEGRADED 0 0 0
> mirror DEGRADED 0 0 0
> ad4.eli ONLINE 0 0 0 684K resilvered
> da0.eli OFFLINE 0 339 0 2.20M resilvered
>
> errors: No known data errors
> root at coyote:/# zpool status
> pool: tank
> state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> action: Online the device using 'zpool online' or replace the device with
> 'zpool replace'.
> scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> tank DEGRADED 0 0 0
> mirror DEGRADED 0 0 0
> ad4.eli ONLINE 0 0 0 684K resilvered
> da0.eli OFFLINE 0 0 0 2.20M resilvered
>
> errors: No known data errors
>
>
> So I ran 'zpool status' thrice after the offline, and the second one reports
> write errors on the OFFLINE device (WTF?). Running zpool status in a loop, this
> will constantly show up and then vanish again.
>
> I also get constant write requests to the remaining device, even though no
> applications are accessing it. What the hell is ZFS trying to do here?
>
> root at coyote:/# zpool iostat 1
> capacity operations bandwidth
> pool used avail read write read write
> ---------- ----- ----- ----- ----- ----- -----
> tank 883G 48.4G 8 246 56.8K 1.53M
> tank 883G 48.4G 8 249 55.9K 1.55M
> tank 883G 48.4G 8 250 55.0K 1.54M
> tank 883G 48.4G 8 252 54.1K 1.56M
> tank 883G 48.4G 8 254 53.3K 1.57M
> tank 883G 48.4G 8 253 52.5K 1.56M
> tank 883G 48.4G 7 255 51.7K 1.57M
> ^C
>
> Again, WTF? Can someone please enlighten me here?
>
> Cheers,
> Ulrich Spörlein
> --
> http://www.dubistterrorist.de/
>
--
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.
Edmund Burke
More information about the freebsd-stable
mailing list