[Bug 198457] zfs acl lost after zfs send-receive. Kernel panic

Wed Jul 12 18:50:10 UTC 2017

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198457

Jose.n <acksist at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |acksist at gmail.com

--- Comment #7 from Jose.n <acksist at gmail.com> ---
Hi. I have found this same issue to still be present in 11.0-RELEASE. A
replicated pool with several TB of data, several volumes, and some 50 snapshots
was sent to a new pool on another system, all the files were verified on both
pools in the most recent snapshot, md5 hashes generated with cfv matched. This
comparison was run as root and access to the files caused no problem.

Then the new pool was put into production, supplying a samba volume for windows
backups with robocopy (inluding acls). This was meant to replace the original
pool. The kernel always crashes shortly after the backup starts, with:
panic: solaris assert: 0 == zfs_acl_node_read(dzp, &paclp, B_FALSE), file:
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c, line: 1692
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b24477 at kdb_backtrace+0x67
#1 0xffffffff80ad97e2 at vpanic+0x182
#2 0xffffffff80ad9653 at panic+0x43
#3 0xffffffff824b520a at assfail+0x1a
#4 0xffffffff82263084 at zfs_acl_ids_create+0x1b4
#5 0xffffffff822689d0 at zfs_make_xattrdir+0x40
#6 0xffffffff82268c95 at zfs_get_xattrdir+0xc5
#7 0xffffffff8227e7e6 at zfs_lookup+0x106
#8 0xffffffff822871d1 at zfs_setextattr+0x181
#9 0xffffffff8110f03f at VOP_SETEXTATTR_APV+0x8f
#10 0xffffffff80b9c404 at extattr_set_vp+0x134
#11 0xffffffff80b9c544 at sys_extattr_set_file+0xf4
#12 0xffffffff80fa26ae at amd64_syscall+0x4ce
#13 0xffffffff80f8488b at Xfast_syscall+0xfb

I have not yet pinned exactly which files are hit when the crash happens, but
the backtrace is always the same. I'm guessing this bug is not found more often
because most people do not put the replicas into production, and the data seems
to be copied correctly anyway. It's the metadata, extended attributes that get
corrupted. So this will mostly hit people who expose and use volumes in the
received pool through samba.

-- 
You are receiving this mail because:
You are the assignee for the bug.