57 ZFS patches not merged to RELENG_7

Alexander Leidinger Alexander at Leidinger.net
Thu Dec 17 08:21:53 UTC 2009


Hi,

I identified at least 57 patches which are in 8-stable, but not in 7-stable.

Does someone know the status of those patches (listed below)? Maybe  
not all are applicable, but some of them should really get merged. I  
also have seen some things which I think are mismerges to 7-stable,  
but I do not have a list/diff of them at hand (I diffed 7-stable and  
8-stable and those things where "in the noise" of the stuff which is  
not merged yet, one of the things I find directly now is two times the  
same line with kmem_cache_create of the zio_cache in  
cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c).

If those people (ok, mostly pjd) which handle ZFS stuff normally do  
not have time to take care about this, are there people interested in  
helping me merge those things? Basically this means trying to apply  
the patches listed below on 7-stable, and testing them (or patches  
from other people, in case you are not able to merge a patch yourself)  
on a scratch box (I do not have a scratch-box with 7-stable around).  
It also means reviewing the patches regarding possible issues (I  
already identified some which need investigation, see below).  
Suggestions where those things should be coordinated in this case (on  
fs@ or stable@ or in the FreeBSD Wiki)?

Below is the list of patches which I identified. I didn't had a look  
which one depends on which one, but there are for sure dependencies.  
The format is
    URL of the patch in head
    committer which committed it - my comment about the patch
    commit log

Bye,
Alexander.

http://svn.freebsd.org/viewvc/base?view=revision&revision=185310
ganbold - very easy merge
---snip---
Remove unused variable.

Found with:     Coverity Prevent(tm)
CID: 3669,3671
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=185319
pjd - applicable to RELENG_7?
---snip---
Fix locking (file descriptor table and Giant around VFS).
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=192689
trasz - very easy merge
---snip---
Fix comment.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=193110
kmacy - easy merge
---snip---
work around snapshot shutdown race reported by Henri Hennebert
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=193128
kmacy - first probably, second and 3rd to check
---snip---
fix xdrmem_control to be safe in an if statement
fix zfs to depend on krpc
remove xdr from zfs makefile
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=193440
ps - shared vnode locks available in RELENG_7? vn_lock same syntax?
---snip---
Support shared vnode locks for write operations when the offset is
provided on filesystems that support it.  This really improves mysql
+ innodb performance on ZFS.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=194043
kmacy - sysctl API change?
---snip---
pjd has requested that I keep the tunable as zfs_prefetch_disable to  
minimize gratuitous
differences with Opensolaris' ZFS
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=195627
marcel - easy merge
---snip---
In nvpair_native_embedded_array(), meaningless pointers are zeroed.
The programmer was aware that alignment was not guaranteed in the
packed structure and used bzero() to NULL out the pointers.
However, on ia64, the compiler is quite agressive in finding ILP
and calls to bzero() are often replaced by simple assignments (i.e.
stores). Especially when the width or size in question corresponds
with a store instruction (i.e. st1, st2, st4 or st8).

The problem here is not a compiler bug. The address of the memory
to zero-out was given by '&packed->nvl_priv' and given the type of
the 'packed' pointer the compiler could assume proper alignment for
the replacement of bzero() with an 8-byte wide store to be valid.
The problem is with the programmer. The programmer knew that the
address did not have the alignment guarantees needed for a regular
assignment, but failed to inform the compiler of that fact. In
fact, the programmer told the compiler the opposite: alignment is
guaranteed.

The fix is to avoid using a pointer of type "nvlist_t *" and
instead use a "char *" pointer as the basis for calculating the
address. This tells the compiler that only 1-byte alignment can
be assumed and the compiler will either keep the bzero() call
or instead replace it with a sequence of byte-wise stores. Both
are valid.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=195785
trasz - extattr_check_cred same sytax in RELENG_7? Necessary?
---snip---
Fix permission handling for extended attributes in ZFS.  Without
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
 From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.

This patch makes it behave just like UFS, i.e. depend on current
file permissions.  Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=195822
trasz - easy merge
---snip---
Fix extattr_list_file(2) on ZFS in case the attribute directory
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0.  For some
reason this didn't manifest on my kernel compiled with -O0.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=195909
pjd - very easy merge
---snip---
We don't support ephemeral IDs in FreeBSD and without this fix ZFS can
panic when in zfs_fuid_create_cred() when userid is negative. It is
converted to unsigned value which makes IS_EPHEMERAL() macro to
incorrectly report that this is ephemeral ID. The most reasonable
solution for now is to always report that the given ID is not ephemeral.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196291
pjd - probably easy merge
---snip---
- Fix a race where /dev/zfs control device is created before ZFS is fully
   initialized. Also destroy /dev/zfs before doing other deinitializations.
- Initialization through taskq is no longer needed and there is a race
   where one of the zpool/zfs command loads zfs.ko and tries to do some work
   immediately, but /dev/zfs is not there yet.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196269
marcel - easy merge
---snip---
Fix misalignment in nvpair_native_embedded() caused by the compiler
replacing the bzero(). See also revision 195627, which fixed the
misalignment in nvpair_native_embedded_array().
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196295
pjd - the added stuff needs to be reviewed, taskqueue available & same syntax?
---snip---
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196297
pjd - easy merge
---snip---
Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have
0 usecount.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196299
pjd - VI_UNLOCK same syntax in RELENG_7?
---snip---
- We need to recycle vnode instead of freeing znode.

Submitted by:	avg

- Add missing vnode interlock unlock.
- Remove redundant znode locking.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196301
pjd - probably easy merge
---snip---
If z_buf is NULL, we should free znode immediately.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196307
pjd - to be reviewed
---snip---
Manage asynchronous vnode release just like Solaris.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196309
pjd - vhold/VN_RELE/zfsctl_root_lookup/vop_vptocnp same syntax in RELENG_7?
---snip---
getcwd() (when __getcwd() fails) works by stating current directory, going up
(..), calling readdir and looking for previous directory inode.  In case of
.zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it
won't be visible in readdir output.

Fix this by implementing VPTOCNP for snapshot directories, so __getcwd()
doesn't fail and getcwd() doesn't have to use readdir method.

This fixes /bin/pwd from within .zfs/snapshot/<name>/.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196395
pjd - applicable to RELENG_7 (same XDR code?)?
---snip---
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196456
pjd - kproc_create the same syntax on RELENG_7?
---snip---
- Give minclsyspri and maxclsyspri real values (consulted with kmacy).
- Honour 'pri' argument for thread_create().
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196457
pjd - easy merge, probably depends upon 196457
---snip---
Set priority of vdev_geom threads and zvol threads to PRIBIO.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196458
pjd - kproc_kthread_add available in RELENG_7? same syntax?
---snip---
- Hide ZFS kernel threads under zfskern process.
- Use better (shorter) threads names:
	'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00'
	'vdev:worker da0' -> 'vdev da0'
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196662
pjd - vn_lock/VOP_UNLOCK syntax the same in RELENG_7 (add curthread?)?
---snip---
Add missing mountpoint vnode locking.

This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196703
pjd - probably easy merge, KBI change (dnode)?
---snip---
Backport the 'dirtying dbuf' panic fix from newer ZFS version.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196919
pjd - very easy merge
---snip---
bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196927
pjd - easy merge
---snip---
Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.

When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196943
pjd - MNT_* same syntax in RELENG_7?
---snip---
- Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196944
pjd - easy merge
---snip---
Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196954
pjd - easy merge
---snip---
If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).

Fix similar case in the code that is currently commented out.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196965
pjd - easy merge
---snip---
Fix reference count leak for a case where snapshot's mount point is updated.
Such situation is not supported.

This problem was triggered by something like this:

	# zpool create tank da0
	# zfs snapshot tank at snap
	# cd /tank/.zfs/snapshot/snap  (this will mount the snapshot)
	# cd
	# mount -u nosuid /tank/.zfs/snapshot/snap  (refcount leak)
	# zpool export tank
	cannot export 'tank': pool is busy
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196980
pjd - VFS_ROOT/VN_RELE same syntax in RELENG_7?
---snip---
When we automatically mount snapshot we want to return vnode of the  
mount point
from the lookup and not covered vnode. This is one of the fixes for  
using .zfs/
over NFS.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196982
pjd - vfs_checkexp/vfs_stdcheckexp same syntax in RELENG_7?
---snip---
We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=196985
pjd - easy merge
---snip---
Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197133
pjd - do we have rw-locks in RELENG_7?
---snip---
- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
   z_dbuf field is NULL - this might happen in case of rollback or forced
   unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
   done asynchronously via zfs_reclaim_complete().
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197151
pjd - easy merge
---snip---
Be sure not to overflow struct fid.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197152
pjd - easy merge
---snip---
Extend scope of the z_teardown_lock lock for consistency and "just in case".
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197153
pjd - easy merge
---snip---
When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197167
pjd - easy merge
---snip---
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.

Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.

This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197177
pjd - easy merge
---snip---
Support both case: when snapshot is already mounted and when it is not yet
mounted.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197201
pjd - VOP_UNLOCK/VI_LOCK/VI_UNLOCK same syntax in RELENG_7 (add curthread?)?
---snip---
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
   df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
   ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
   mount(8) and df(1) output by default.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197351
pjd - easy merge
---snip---
Purge namecache in the same place OpenSolaris does.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197458
pjd - probably easy merge
---snip---
Close race in zfs_zget(). We have to increase usecount first and then
check for VI_DOOMED flag. Before this change vnode could be reclaimed
between checking for the flag and increasing usecount.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197459
pjd - probably easy merge
---snip---
Before calling vflush(FORCECLOSE) mark file system as unmounted so the
following vnops will fail. This is very important, because without this change
vnode could be reclaimed at any point, even if we increased usecount. The only
way to ensure that vnode won't be reclaimed was to lock it, which  
would be very
hard to do in ZFS without changing a lot of code. With this change simply
increasing usecount is enough to be sure vnode won't be reclaimed from under
us. To be precise it can still be reclaimed but we won't be able to see it,
because every try to enter ZFS through VFS will result in EIO.

The only function that cannot return EIO, because it is needed for vflush() is
zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks
z_teardown_lock and never returns EIO.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197497
pjd - easy merge, implications for existing pools/data?
---snip---
Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to
be a bit weak and OpenSolaris also switched to fletcher4.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197512
pjd - VI_UNLOCK same syntax in RELENG_8?
---snip---
- Don't depend on value returned by gfs_*_inactive(), it doesn't work
   well with forced unmounts when GFS vnodes are referenced.
- Make other preparations to GFS for forced unmounts.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197513
pjd - traverse/VN_RELE same syntax in RELENG_7?
---snip---
Use traverse() function to find and return mount point's vnode instead of
covered vnode when snapshot is already mounted.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197515
pjd - probably easy merge
---snip---
Handle cases where virtual (GFS) vnodes are referenced when doing forced
unmount. In that case we cannot depend on the proper order of invalidating
vnodes, so we have to free resources when we have a chance.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197683
delphij - easy merge
---snip---
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197831
pjd - does the added variable cause a KBI change?
---snip---
Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=197861
pjd - priv_check available / VOP_ACCESS same syntax in RELENG_7?
---snip---
Allow file system owner to modify system flags if securelevel permits.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=198703
pjd - vaccess same syntax on RELENG_7?
---snip---
- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
   zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
   zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=199156
pjd - probably easy merge (struct change, KBI implications?)
---snip---
Avoid passing invalid mountpoint to getnewvnode().
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=199157
pjd - important, maybe security, probably easy merge
---snip---
Be careful which vattr fields are set during setattr replay.
Without this fix strange things can appear after unclean shutdown like
files with mode set to 07777.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=200124
pjd - very easy merge
---snip---
Avoid using additional variable for storing an error if we are not going
to do anything with it.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=200126
pjd - easy merge, prevents ZFS pools on ZVOLs
---snip---
Fix deadlock when ZVOLs are present and we are replacing dead component or
calling scrub when pool is in a degraded state. It will try to taste ZVOLs,
which will lead to deadlock, as ZVOL will try to acquire the same locks as
replace/scrub is holding already.

We can't simply skip provider based on their GEOM class, because ZVOL can have
providers build on top of it and we need to skip those as well.

We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider  
will give
us positive answer and we have to skip those providers.

This way we remove possibility to create ZFS pools on top of ZVOLs, but it is
not very useful anyway.

I believe deadlock is still possible in some very complex situations like when
we have MD provider on top of UFS file on top of ZVOL. When we try to replace
dead component in the pool mentioned ZVOL is based on, there might be a
deadlock when ZFS will try to taste MD provider. There is no easy way  
to detect
that, but it isn't very common.
---snip---

http://svn.freebsd.org/viewvc/base?view=revision&revision=200158
pjd - easy merge
---snip---
We have to eventually look for provider without checking guid as this is need
for attaching when there is no metadata yet.

Before r200125 the order of looking for providers was wrong. It was:
1. Find provider by name.
2. Find provider by guid.
3. Find provider by name and guid.

Where it should have been:
1. Find provider by name and guid.
2. Find provider by guid.
3. Find provider by name.
---snip---

-- 

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137





More information about the freebsd-fs mailing list