weird bug with ZFS and SLOG
Peter Maloney
peter.maloney at brockmann-consult.de
Tue Dec 6 12:19:42 UTC 2011
On 12/05/2011 11:07 PM, Adam Stylinski wrote:
> The worst case scenario happened to me where my dedicated SLOG decided to drop off the controller and thus prevent me from importing my pool. I quickly upgrade to FreeBSD 9.0-RC2 after testing this scenario in a VM. It has worked successfully in a VM, but it is not working on my hardware for whatever reason. I rollback the pool with zpool import -F share, seems ok, files are there, finishes scrub, very little corruption. I upgrade the pool to V28, and the fs's to v5. I then do a:
> zpool remove share 15752248745115926170
>
> It returns no errors and pretends like the operation worked, it even appends it to my zpool history. However, when I do a zpool status, this is what I get:
>
> [adam at nasbox ~]$ zpool status
> pool: share
> state: DEGRADED
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scan: scrub repaired 0 in 8h57m with 0 errors on Mon Dec 5 12:48:28 2011
> config:
>
> NAME STATE READ WRITE CKSUM
> share DEGRADED 0 0 0
> raidz1-0 ONLINE 0 0 0
> ada4 ONLINE 0 0 0
> ada1 ONLINE 0 0 0
> *ada2* ONLINE 0 0 0
> ada3 ONLINE 0 0 0
> raidz1-1 ONLINE 0 0 0
> da3 ONLINE 0 0 0
> da0 ONLINE 0 0 0
> da2 ONLINE 0 0 0
> da1 ONLINE 0 0 0
> raidz1-2 ONLINE 0 0 0
> aacd0 ONLINE 0 0 0
> aacd1 ONLINE 0 0 0
> aacd2 ONLINE 0 0 0
> aacd3 ONLINE 0 0 0
> raidz1-4 ONLINE 0 0 0
> aacd4 ONLINE 0 0 0
> aacd5 ONLINE 0 0 0
> aacd6 ONLINE 0 0 0
> aacd7 ONLINE 0 0 0
> logs
> 15752248745115926170 UNAVAIL 0 0 0 was /dev/*ada2*
This looks like another case of not using labels. (see that share has
ada2 in the list, but the log "was /dev/ada2"; they must have
switched... maybe they also resilvered and your log is overwritten)
I did the same thing when I started on FreeBSD and ZFS... nobody warned
me either. When you reboot, sometimes the disks move around and change
numbers. Maybe they are reliable with onboard SATA ports (from my
experience), but with more io cards, removable media, expanders, etc.
they don't seem to ever stay put for me. For me, only the first disk on
the back expander and the first disk on the front expander ever seem to
be the same, and if I add a new disk in the back, the front ones go up
by 1. When a data disk from my pool would switch places with another
data disk from the same pool, zfs would automatically handle it. But
when a hotspare or something else switched places, it would look the
same as you see in your zpool status. "some big number .... UNAVAIL 0 0
0 was /dev/da#"
Here, I wrote you a howto, to explain how to convert to labels:
http://forums.freebsd.org/showthread.php?p=157004
> errors: 3 data errors, use '-v' for a list
>
> Here is the ending output of zpool history:
>
> 2011-12-05.03:38:50 zpool upgrade -V 28 -a
> 2011-12-05.03:39:09 zpool export share
> 2011-12-05.03:39:33 zpool import -m share
> 2011-12-05.03:40:05 zpool remove share 15752248745115926170
> 2011-12-05.03:41:04 zpool remove share 15752248745115926170
> 2011-12-05.03:41:18 zpool export share
> 2011-12-05.03:41:56 zpool import -m share
> 2011-12-05.03:43:47 zpool remove share 15752248745115926170
> 2011-12-05.03:47:54 zpool remove share 15752248745115926170
> 2011-12-05.03:51:20 zpool scrub share
> 2011-12-05.16:33:01 zfs create share/vardb2
> 2011-12-05.16:33:32 zfs set compression=gzip-9 share/vardb2
> 2011-12-05.16:33:38 zfs set atime=off share/vardb2
> 2011-12-05.16:39:37 zfs destroy share/vardb
> 2011-12-05.16:39:47 zfs rename share/vardb2 share/vardb
> 2011-12-05.16:39:53 zfs set mountpoint=/var/db share/vardb
> 2011-12-05.16:47:24 zpool clear share
> 2011-12-05.16:48:41 zpool remove share 15752248745115926170
> 2011-12-05.16:53:15 zpool export -f share
> 2011-12-05.16:55:21 zpool import -m share
> 2011-12-05.16:55:52 zpool remove share 15752248745115926170
> 2011-12-05.16:56:56 zpool remove share -f 15752248745115926170
> 2011-12-05.17:04:07 zpool remove share 15752248745115926170
>
> What is going on here and how do I fix it?
>
--
--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------
More information about the freebsd-fs
mailing list