svn commit: r216230 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Pawel Jakub Dawidek pjd at FreeBSD.org
Tue Dec 7 09:42:48 UTC 2010


On Mon, Dec 06, 2010 at 03:18:49PM -0500, John Baldwin wrote:
> On Monday, December 06, 2010 2:53:27 pm Pawel Jakub Dawidek wrote:
> > On Mon, Dec 06, 2010 at 08:35:36PM +0100, Ivan Voras wrote:
> > > Please persuade me on technical grounds why ashift, a property
> > > intended for address alignment, should not be set in this way. If your
> > > answer is "I don't know but you are still wrong because I say so" I
> > > will respect it and back it out but only until I/we discuss the
> > > question with upstream ZFS developers.
> > 
> > No. You persuade me why changing ashift in ZFS, which, as the comment
> > clearly states is "device's minimum transfer size" is better and not
> > hackish than presenting the disk with properly configured sector size.
> > This can not only affect disks that still use 512 bytes sectors, but
> > doesn't fix the problem at all. It just works around the problem in ZFS
> > when configured on top of raw disks.
> > 
> > What about other file systems? What about other GEOM classes? GELI is
> > great example here, as people use ZFS on top of GELI alot. GELI
> > integrity verification works in a way that not reporting disk sector
> > size properly will have huge negative performance impact. ZFS' ashift
> > won't change that.
> 
> I am mostly on your side here, but I wonder if GELI shouldn't prefer the 
> stripesize anyway?  For example, if you ran GELI on top of RAID-5 I imagine it 
> would be far more performant for it to use stripe-size logical blocks instead 
> of individual sectors for the underlying media.

Not exactly. GELI with authentication stores checksum in the same sector
as data. This way we have less than 512 bytes of data per sector. To
still be able to provide power of 2 sectors GELI and not to lose too
much space, GELI has to present larger sector to the upper layers.
For example with 512 bytes sectors of the underlying provider, GELI
presents 4kB sector to the upper layers, but every 4kB GELI sector is
build from nine 512 bytes sector of the underlying provider.

I'm not sure if my description is readable:) If you are interested, take
a look at the top of g_eli_integrity.c. It might be better described in
there.

> The RAID-5 argument also suggests that other filesystems should probably
> prefer stripe sizes to physical sector sizes when picking block sizes, etc.

I'm not so sure. Stripe size of RAID5 tends to be too large to do that.
By using 128kB ashift we will lose way too much space when it comes to
smaller files and metadata.
Stripesize is just a hit what alignment is optimal, but it is optional -
consumer can decide to ignore it if we care more about space than
performance, for example. Sectorsize on the other hand is not a hint,
but really the smallest block a provider can handle.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/svn-src-head/attachments/20101207/40da1d69/attachment.pgp


More information about the svn-src-head mailing list