Spectra Logic's ZFS changes

Will Andrews will at firepipe.net
Tue Sep 3 16:33:25 UTC 2013


Hi,

In the past, I have posted Spectra Logic's ZFS work to my GitHub
account, primarily for Illumos developers to review.

We have a number of changes that are FreeBSD-specific, but the vast
majority (as % of diff size) are not.  For this reason, my repository
is a clone of illumos-gate, and I wrote a "transfer" script that
copies changes from FreeBSD & SpectraBSD.  The goal was to create
baselines for diffs between the three implementations, to ease
migration of specific pieces.

Some of our changes are finished.  Others need a little polish.  Still
others need more testing or feedback from others to integrate
upstream.  Many are over a year or two old, mostly because of the
time-consuming nature of pushing them upstream, and because product
development & bugfixes keep us busy.  We are happy to work with others
to integrate our changes upstream, where there is interest in doing
so.

Several of our smaller changes have already been pushed into
illumos-gate and merged down to FreeBSD head & stable/9, but there are
many more (generally larger & more complex) that have yet to be
merged.


Here is a list of some of the features & infrastructure changes:

* Asynchronous copy-on-write: Instead of synchronously resolving COW
faults, handle them with asynchronous reads in the background.  This
makes it feasible to use large recordsizes without paying much of a
performance penalty for partial writes.  Which in turn yields ideal
results for RAIDZ, which works best with large recordsizes.

* Asynchronous DMU: Provide a means of performing DMU calls
asynchronously.  This was intended primarily to give ZVOLs a chance to
handle multiple outstanding I/Os and thus maintain more simultaneous
I/O and provide ZFS prefetch with additional context.  As part of this
project, we also minimized dnode holds performed during DMU I/Os,
centralized chunking of DMU I/Os, and got rid of most code duplication
in dmu.c.

* DMU buffer user eviction improvements: reduces lock order reversal
opportunity, improves verifiability of the mechanism's operations.

* Clean up ZVOL locking; it now uses the objset/dataset user mechanism
to atomically manage ownership and object lifetime, instead of the
global spa_namespace_lock.  This mechanism is already used for
filesystems & snapshots.

* Separate ZVOL OS-independent and OS-dependent layers.  On FreeBSD,
geoms are no longer directly connected to the underlying zvol & objset
just by being created and destroyed; they now behave more like
Illumos' zvol vfsops.  Fix many GEOM-related locking & object lifetime
issues by separating control and I/O paths.

* Add generalized hooks for the SPA to manage ZVOL device nodes in the
appropriate places, instead of #ifdef'd hooks.

* FreeBSD ZFS FUID/ACL & system/user flags implementation, to support
storing/retrieving & honoring of native CIFS ACLs.

* Expansion of event notifications, particularly for pool/dataset events.

* Detect & handle device blocksize optimally, already committed to
FreeBSD/head by Justin.


We gave a presentation at BSDCan 2012 about the Async COW & Async DMU
work, as well as some related bits.

My GitHub URL for this work is: http://github.com/wca/illumos-gate/

'master' is a mirror of illumos/illumos-gate/master.

The most recent branches are freebsd-20130820-r254568, which is the
ZFS/DTrace-only bits from FreeBSD/stable/9 as of r254568.  Branched
from that is spectrabsd-20130820, which contains the SpectraBSD
version of the same files using the same baseline.  Both branches were
updated as of August 20th.

The diffstat summary for freebsd-20130820-r254568 vs
spectrabsd-20130820 is: 121 files changed, 10127 insertions(+), 4973
deletions(-).

I'm curious: How much divergence from Illumos are other FreeBSD ZFS
developers willing to accept, and for how long?  I have been assuming
that simply committing most of the non-FreeBSD-specific changes to
FreeBSD/head would be unacceptable in terms of impact on merging from
illumos-gate.  :-)

Thanks,
--Will.


More information about the zfs-devel mailing list