zfs: using, then destroying a snapshot sometimes panics zfs

Sun Feb 15 03:08:55 PST 2009

Am 15.02.2009 um 11:39 schrieb Stefan Bethke:

> Am 08.02.2009 um 14:37 schrieb Stefan Bethke:
>
>> Sorry I can't be more precise at the moment, but while creating a  
>> script that mirrors some zfs filesystems to another machine, I've  
>> now twice gotten weird behaviour and then a panic.
>>
>> The script iterates over a couple of zfs file systems:
>> - creates a snapshot with zfs snapshot tank/foo at mirror
>> - uses rsync to copy the contents of the snapshot with rsync /tank/ 
>> foo/.zfs/snapshot/mirror/ dest:...
>> - destroys the snapshot with zfs destroy tank/foo at mirror
>>
>> During testing the script, I twice got to a point where, after the  
>> snapshot was created without an error message, rsync dropped out  
>> with an error message similar to "invalid file handle" on /tank/ 
>> foo/.zfs/snapshot.
>>
>> At that point, I could cd to /tank/foo/.zfs, but ls produced the  
>> same error message.
>>
>> I then tried to unmount the snapshot with zfs umount, and got a  
>> panic (which I also didn't manage to capture).
>>
>> Is this a generally known issue, or should I try to capture more  
>> information when this happens again?
>
>
> # cd /tank/foo/.zfs
> # ls -l
> ls: snapshot: Bad file descriptor
> total 0
> # cd snapshot
> -su: cd: snapshot: Not a directory
>
> I currently have no snapshots:
> # zfs list -t snapshot
> no datasets available
>
> However, on a different file system, I can list and cd into snapshot:
> # /tank/bar/.zfs
> # ls -l
> total 0
> dr-xr-xr-x  2 root  wheel  2 Feb  8 00:43 snapshot/
> # cd snapshot
>
> Trying to umount produces a panic:
> # zfs umount /jail/foo
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address	= 0xa8
> fault code		= supervisor write data, page not present
> instruction pointer	= 0x8:0xffffffff802ee565
> stack pointer	        = 0x10:0xfffffffea29c39e0
> frame pointer	        = 0x10:0xfffffffea29c39f0
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 51383 (zfs)
> [thread pid 51383 tid 100298 ]
> Stopped at      _sx_xlock+0x15: lock cmpxchgq   %rsi,0x18(%rdi)
> db> bt
> Tracing pid 51383 tid 100298 td 0xffffff00a598e720
> _sx_xlock() at _sx_xlock+0x15
> zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5
> zfs_umount() at zfs_umount+0xdd
> dounmount() at dounmount+0x2b4
> unmount() at unmount+0x24b
> syscall() at syscall+0x1a5
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp =  
> 0x7fffffffd1a8, rbp = 0x801202300 ---
> db> call doadump
> Physical memory: 3314 MB
> Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113  
> 1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873  
> 857 841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601  
> 585 569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329  
> 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41  
> 25 9
> Dump complete
> = 0
>
> I've got the crashdump saved, if there's any information in there  
> that can be helpful.
>
> This is -current from a week ago on amd64.
>
> At the current rate, this happens every couple of days, so gathering  
> more information on the live system probably won't be a problem.

Different machine, identical configuration, I just got this panic on  
reboot:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xa8
fault code		= supervisor write data, page not present
instruction pointer	= 0x8:0xffffffff802ee3b5
stack pointer	        = 0x10:0xfffffffe40016980
frame pointer	        = 0x10:0xfffffffe40016990
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1 (init)
[thread pid 1 tid 100002 ]
Stopped at      _sx_xlock+0x15: lock cmpxchgq   %rsi,0x18(%rdi)
db> bt
Tracing pid 1 tid 100002 td 0xffffff000141fab0
_sx_xlock() at _sx_xlock+0x15
zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5
zfs_umount() at zfs_umount+0xdd
dounmount() at dounmount+0x2b4
vfs_unmountall() at vfs_unmountall+0x42
boot() at boot+0x655
reboot() at reboot+0x42
syscall() at syscall+0x1a5
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (55, FreeBSD ELF64, reboot), rip = 0x40897c, rsp =  
0x7fffffffe7b8, rbp = 0x402420 ---

-- 
Stefan Bethke <stb at lassitu.de>   Fon +49 151 14070811