on st_blksize value

Andriy Gapon avg at freebsd.org
Tue Mar 23 14:16:05 UTC 2010


First, what I am proposing:
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -790,11 +790,11 @@ vn_stat(vp, sb, active_cred, file_cred, td)
 	 *    to file"
 	 * Default to PAGE_SIZE after much discussion.
 	 * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more correct.
 	 */

-	sb->st_blksize = PAGE_SIZE;
+	sb->st_blksize = max(PAGE_SIZE, vap->va_blocksize);
 	
 	sb->st_flags = vap->va_flags;
 	if (priv_check(td, PRIV_VFS_GENERATION))
 		sb->st_gen = 0;
 	else

Explanation:
1. IMO it is not nice that we totally ignore va_blocksize value that can be set by
a filesystem.  This takes away flexibility. That va_blocksize value might really
turn out to be optimal given the filesystem implementation.
2. As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any
smaller value.  For some case this might not be optimal (which I personally
doubt), but at least nothing should get broken.

One practical benefit can be with ZFS: if a filesystem has recordsize > PAGE_SIZE
(e.g. default 128K) and it has checksums or compression enabled, then
(over-)writing in blocks smaller than recordsize would require reading of a whole
record first.  And some applications do use st_blksize as a hint (just for the
record: some other use f_iosize instead, and yet some use a hardcoded value).
BTW, some torrent-like applications can serve as a good example of applications
that overwrite chunks of existing files.

Additionally, here's a little bit of history that explains the PAGE_SIZE ("much
discussion") comment in vn_stat.  It seems that the comment may be misleading
nowadays.
It was introduced in r89784 and at that time it applied only to the case of
non-VREG and non-vn_isdisk vnodes.
Then, almost 3 years later, in revision 136966 code for VREG vnodes and vn_isdisk
vnodes was dropped, the XXX comment was introduced, and we ended up with the
current state of matters.

BTW, I am not sure about the XXX comment either.
Using bo_bsize may be a nice shortcut, but it would also take away some
flexibility.  Filesystems can already set bo_bsize and va_blocksize to the same
value, but there could be special cases where they not need be the same.

Thanks a lot for opinions and suggestions!

P.S. Yes, I have read the following interesting thread _completely_:
http://lists.freebsd.org/pipermail/freebsd-fs/2007-May/003155.html
And this one too:
http://freebsd.monkey.org/freebsd-fs/200810/msg00059.html
Unfortunately, the discussions didn't result in any action.

-- 
Andriy Gapon


More information about the freebsd-fs mailing list