zfs: using, then destroying a snapshot sometimes panics zfs

Sun Feb 15 02:40:00 PST 2009

Am 08.02.2009 um 14:37 schrieb Stefan Bethke:

> Sorry I can't be more precise at the moment, but while creating a  
> script that mirrors some zfs filesystems to another machine, I've  
> now twice gotten weird behaviour and then a panic.
>
> The script iterates over a couple of zfs file systems:
> - creates a snapshot with zfs snapshot tank/foo at mirror
> - uses rsync to copy the contents of the snapshot with rsync /tank/ 
> foo/.zfs/snapshot/mirror/ dest:...
> - destroys the snapshot with zfs destroy tank/foo at mirror
>
> During testing the script, I twice got to a point where, after the  
> snapshot was created without an error message, rsync dropped out  
> with an error message similar to "invalid file handle" on /tank/ 
> foo/.zfs/snapshot.
>
> At that point, I could cd to /tank/foo/.zfs, but ls produced the  
> same error message.
>
> I then tried to unmount the snapshot with zfs umount, and got a  
> panic (which I also didn't manage to capture).
>
> Is this a generally known issue, or should I try to capture more  
> information when this happens again?

# cd /tank/foo/.zfs
# ls -l
ls: snapshot: Bad file descriptor
total 0
# cd snapshot
-su: cd: snapshot: Not a directory

I currently have no snapshots:
# zfs list -t snapshot
no datasets available

However, on a different file system, I can list and cd into snapshot:
# /tank/bar/.zfs
# ls -l
total 0
dr-xr-xr-x  2 root  wheel  2 Feb  8 00:43 snapshot/
# cd snapshot

Trying to umount produces a panic:
# zfs umount /jail/foo

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xa8
fault code		= supervisor write data, page not present
instruction pointer	= 0x8:0xffffffff802ee565
stack pointer	        = 0x10:0xfffffffea29c39e0
frame pointer	        = 0x10:0xfffffffea29c39f0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 51383 (zfs)
[thread pid 51383 tid 100298 ]
Stopped at      _sx_xlock+0x15: lock cmpxchgq   %rsi,0x18(%rdi)
db> bt
Tracing pid 51383 tid 100298 td 0xffffff00a598e720
_sx_xlock() at _sx_xlock+0x15
zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5
zfs_umount() at zfs_umount+0xdd
dounmount() at dounmount+0x2b4
unmount() at unmount+0x24b
syscall() at syscall+0x1a5
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp =  
0x7fffffffd1a8, rbp = 0x801202300 ---
db> call doadump
Physical memory: 3314 MB
Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113  
1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873 857  
841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601 585  
569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329 313  
297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41 25 9
Dump complete
= 0

I've got the crashdump saved, if there's any information in there that  
can be helpful.

This is -current from a week ago on amd64.

At the current rate, this happens every couple of days, so gathering  
more information on the live system probably won't be a problem.

Stefan

-- 
Stefan Bethke <stb at lassitu.de>   Fon +49 151 14070811