svn commit: r216230 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Ivan Voras ivoras at
Mon Dec 6 20:29:23 UTC 2010

Firstly, thank you, your explanations and questions are very good and
address the problems in a good way to discuss about it. I respect that
you took the time to write this answer!

On 6 December 2010 20:53, Pawel Jakub Dawidek <pjd at> wrote:
> On Mon, Dec 06, 2010 at 08:35:36PM +0100, Ivan Voras wrote:
>> Please persuade me on technical grounds why ashift, a property
>> intended for address alignment, should not be set in this way. If your
>> answer is "I don't know but you are still wrong because I say so" I
>> will respect it and back it out but only until I/we discuss the
>> question with upstream ZFS developers.
> No. You persuade me why changing ashift in ZFS, which, as the comment
> clearly states is "device's minimum transfer size" is better and not
> hackish than presenting the disk with properly configured sector size.

You are right, it was, at least originally, defined as the device's
minimum transfer size, and it is in this context hackish to change it
to something that isn't it.

But there are two reasons that I think are important, which resulted
in changing this:

1) It is being used out of the original context in the mailing list
posts I've referenced - it was being used (and in a worse way, by
having a hacked zpool binary) for alignment like I did in the patch,
despite that the sectorsize wasn't changed.

2) The solaris "big sector" project, described in
basically blesses ZFS to work with big sectors, and states "ZFS can
automatically run on larger sector size disk after the label change
and I/O path change." Since the effect of having a larger sectorsize
will effectively be only the change in ashift, this is what I've done.

I would of course also be happy with simply having drives with
sectorsize=4096, but I've given up on this route because of two
reasons: mav was essentially against it since it will break
compatibility if suddenly changed and the posts on opensolaris mailing
lists are basically saying that it is likely that the 512/4096 kludge
will remain "forever" and it's unlikely to change.

> This can not only affect disks that still use 512 bytes sectors, but
> doesn't fix the problem at all. It just works around the problem in ZFS
> when configured on top of raw disks.

I'm not sure how to parse "can not only affect disks that still use
512 byte sectors" - if you are saying that it can affect disks with
512 byte sectors, I can think of only one case when it can have a
serious effect: if the drive (or GEOM) suddenly use 4 KiB sectorsize
on a drive which was previously formatted with ZFS as a drive with 512
byte sectors. All other combinations work because ashift is a part of
ZFS metadata - this calculation is overriden in case e.g. a zvol
created with ashift=12 is imported on a kernel/OS which only sees 512
byte sectors.

> What about other file systems? What about other GEOM classes? GELI is
> great example here, as people use ZFS on top of GELI alot. GELI
> integrity verification works in a way that not reporting disk sector
> size properly will have huge negative performance impact. ZFS' ashift
> won't change that.

Does this mean that, despite that ZFS is correctly aligned, GELI can
break this alignment policy so still, at the end, produce wrong
alignments for IOs? If so, I understand your concern, but it looks
like a problem with GELI, not ZFS. The same problem will occur if any
other file system type is created which is "properly aligned" from the
POW of the administrator.

> So you should back this change out, provide technical arguments (if they
> exist) that this is the right solution to the problem and not "hey, here
> is a patch, I think it is ok".

Thank you, I will try to remember to give more details in commits.

> BTW. ZFS is no longer open-source if you didn't notice.

I have noticed but I don't understand how it affects FreeBSD. I would
like to discuss this sometime but unless you have some urgent
interpretation of this information, I think it's best to start another
thread about it.

I think that, basically, the whole argument boils down to the question
like "while ZFS will most likely implement big sector support in this
way, it doesn't right now, so should we do it ourselves?"

I hope that this information I've written helps explain why I did the
patch. If you are still against it, I will back it out.

More information about the svn-src-all mailing list