ZFS Snapshot problems

Tue Feb 14 14:18:46 UTC 2012

Was your pool created at the current version, or upgraded?

Some pools have issues when upgraded. Mine had a separate log that could
not be removed after being upgraded to v28. So I destroyed it and
recreated it, and things are fine. I don't know if it is the upgrade
process that is broken, or just that the old ZFS code in FreeBSD was
buggy, so pools are slightly corrupt.

And what zpool and zfs version are you running?

Is this your FreeBSD version? (FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10
20:35:13 GMT 2012)
Your FreeBSD sounds very old. I tried 8.2 stable from April and it was
unusably unstable with zfs.

If you are using a recent STABLE pull, and have created the pool with an
old version of FreeBSD, have you considered destroying the pool and
recreating it with your backups, using zfs send & recv?

On 02/14/2012 11:00 AM, Matthew Seaman wrote:
> On 12/02/2012 13:56, Matthew Seaman wrote:
>> On 12/02/2012 13:10, Peter Maloney wrote:
>>>> I don't know what side effects that change has though. You can usually
>>>> assume that ZFS will just figure out the pool regardless of labels
>>>> (because it uses its own label metadata; see zdb output to see the other
>>>> id), but apparently your case is something special, getting actual
>>>> errors instead of only wrong names.
>> Yes.  This is most perplexing -- it's such a specific effect.  The gpt
>> thing may well be a red herring.  It is odd though that zdb somehow
>> discovers the gpart labels through reading zpool.cache, but zpool(1)
>> uses the gptids instead.
> Some more data about the underlying problem.
>
>   -- There is another symptom: once the snapshots get wedged, the
>      system will crash on shutdown.  I don't have a crashdump or
>      anything particularly useful, but this is what appeared in the
>      kernel log:
>
> +
> +Fatal trap 12: page fault while in kernel mode
> +cpuid = 0; apic id = 00
> +fault virtual address	= 0xa8
> +fault code		= supervisor write data, page not present
> +instruction pointer	= 0x20:0xffffffff805f9e65
> +stack pointer	        = 0x28:0xffffff800003a920
> +frame pointer	        = 0x28:0xffffff800003a930
> +code segment		= base 0x0, limit 0xfffff, type 0x1b
> +			= DPL 0, pres 1, long 1, def32 0, gran 1
> +processor eflags	= interrupt enabled, resume, IOPL = 0
> +current process		= 1 (init)
> +trap number		= 12
> +panic: page fault
> +cpuid = 0
> +KDB: stack backtrace:
> +#0 0xffffffff80624c0e at kdb_backtrace+0x5e
> +#1 0xffffffff805f1d53 at panic+0x183
> +#2 0xffffffff808df490 at trap_fatal+0x290
> +#3 0xffffffff808df7e1 at trap_pfault+0x201
> +#4 0xffffffff808dfc9f at trap+0x3df
> +#5 0xffffffff808c7284 at calltrap+0x8
> +#6 0xffffffff80f8a2e5 at zfsctl_umount_snapshots+0xa5
> +#7 0xffffffff80f9b74f at zfs_umount+0x6f
> +#8 0xffffffff8067dc1c at dounmount+0x26c
> +#9 0xffffffff80681332 at vfs_unmountall+0x42
> +#10 0xffffffff805f1b70 at boot+0x790
> +#11 0xffffffff805f1e4c at reboot+0x6c
> +#12 0xffffffff808deb44 at amd64_syscall+0x1f4
> +#13 0xffffffff808c757c at Xfast_syscall+0xfc
> +Uptime: 10d23h49m19s
> +FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012
> +CPU: Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (3166.33-MHz
> K8-class CPU)
> +avail memory = 8196075520 (7816 MB)
> +dcons_crom0: bus_addr 0x3d94000
> +pid 89559 (emacs) is using legacy pty devices - not logging anymore
> +instruction pointer	= 0x20:0xffffffff8060d275
> +#0 0xffffffff8063801e at kdb_backtrace+0x5e
> +#1 0xffffffff80605163 at panic+0x183
> +#2 0xffffffff808f2da0 at trap_fatal+0x290
> +#3 0xffffffff808f30f1 at trap_pfault+0x201
> +#4 0xffffffff808f35af at trap+0x3df
> +#5 0xffffffff808dab94 at calltrap+0x8
> +#6 0xffffffff80fa42e5 at zfsctl_umount_snapshots+0xa5
> +#7 0xffffffff80fb574f at zfs_umount+0x6f
> +#8 0xffffffff8069103c at dounmount+0x26c
> +#9 0xffffffff80695482 at vfs_unmountall+0x42
> +#10 0xffffffff80604f80 at boot+0x790
> +#11 0xffffffff8060525c at reboot+0x6c
> +#12 0xffffffff808f2454 at amd64_syscall+0x1f4
> +#13 0xffffffff808dae8c at Xfast_syscall+0xfc
> +Uptime: 2d10h51m47s
> +FreeBSD 8.2-STABLE #3 r231563: Mon Feb 13 01:37:39 GMT 2012
> +avail memory = 8196034560 (7816 MB)
>
>    -- I can't conform this yet, but I've a feeling that removing the
>       *last* snapshot is significant.  Whether it's the last snapshot
>       of a particular zfs or the last snapshot in the zpool I don't know
>       yet.  Testing this is a long-winded affair as I can't afford to
>       keep rebooting this server, and I need it to backup successfully
>       most of the time.
>
>    -- The problem only seems to occur when snapshots are removed, so my
>       workaround for the time being is not to remove the snapshots I
>       create for each nightly backup.
>
> 	Cheers,
>
> 	Matthew
>

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------