quick fix for slow directory shrinking in ffs
Bruce Evans
brde at optusnet.com.au
Tue May 17 22:39:20 UTC 2016
On Wed, 18 May 2016, Konstantin Belousov wrote:
> On Wed, May 18, 2016 at 05:40:07AM +1000, Bruce Evans wrote:
>> Also, ftruncate() seems to be broken. POSIX doesn't seem to require it
>> to honor O_SYNC, but POLA requires this. But there is no VOP_TRUNCATE();
>> truncation is done using VOP_SETATTR() and there is no way to pass down
>> the O_SYNC flag to it; in practice, ffs just does UFS_TRUNCATE() without
>> IO_SYNC.
>>
>> This makes a difference mainly for async mounts with my fixes to honor
>> IO_SYNC in ffs_update(). With async mounts, consistency of the file
>> system is not guaranteed but O_SYNC for a file should at least cause
>> all of the file data and most of its metdata to be written. Not syncing
>> for ftruncate() unnecessarily loses metadata writes. With !async mounts,
>> consistency of the file system is partly guaranteed and lost metadata
>> writes for ftruncate() shouldn't affect this -- they should just lose
>> the ftruncate() atomically.
>>
>> vfs could do an fsync() after VOP_SETATTR() for the O_SYNC case. This
>> reduces the race window.
>
> vattr already has the va_vaflags field. It is trivial to add flag there
> requesting O_SYNC behaviour. Of course, other updates could also
> honour VA_SYNC, but this is for later. Like this:
>
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index 0a3a88a..1e42a3d 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -1314,6 +1314,8 @@ vn_truncate(struct file *fp, off_t length, struct ucred *active_cred,
> if (error == 0) {
> VATTR_NULL(&vattr);
> vattr.va_size = length;
> + if ((fp->f_flag & O_FSYNC) != 0)
> + vattr.va_vaflags |= VA_SYNC;
> error = VOP_SETATTR(vp, &vattr, fp->f_cred);
> }
> out:
> diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
> index e82f6ee..41ec7f7 100644
> --- a/sys/sys/vnode.h
> +++ b/sys/sys/vnode.h
> @@ -286,6 +286,7 @@ struct vattr {
> */
> #define VA_UTIMES_NULL 0x01 /* utimes argument was NULL */
> #define VA_EXCLUSIVE 0x02 /* exclusive create request */
> +#define VA_SYNC 0x04 /* O_SYNC truncation */
>
> /*
> * Flags for ioflag. (high 16 bits used to ask for read-ahead and
> diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c
> index c0729f8..83df347 100644
> --- a/sys/ufs/ufs/ufs_vnops.c
> +++ b/sys/ufs/ufs/ufs_vnops.c
> @@ -625,7 +625,8 @@ ufs_setattr(ap)
> */
> return (0);
> }
> - if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL,
> + if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL |
> + ((vap->va_vaflags & VA_SYNC) != 0 ? IO_SYNC : 0),
> cred)) != 0)
> return (error);
> }
Looks good.
O_SYNC is actually spelled O_FSYNC in FreeBSD. You spelled it correctly
in the above, but about 2 places in the kernel use the POSIX spelling.
It is confusing enough to also have the spellings FFSYNC and IO_SYNC
for this flag in different layers. FFSYNC is for fcntl and must equal
O_FSYNC since the layers are not clearly separated. IO_SYNC is for
a clearly separated layer and is supposed to be translated to, but it
has the same value as O_FSYNC so O_FSYNC might work accidentally when
not translated.
This should probably also be done for truncations with O_TRUNC at open
time. There are a couple of these in vfs_syscalls.c. O_TRUNC is used
much more than ftruncate() so the extra overhead from this would be
more noticeable. I think the implementation is not very good. If
open() with O_TRUNC or truncate with O_FSYNC or fsync() fails, then
the file contents might be garbage. So it would be better to do
large truncations mostly async and only sync at the end. *fs_truncate()
could operate like that, but I think it takes the IO_SYNC flag as a
directive to do the whole operation synchronously. A non-sync truncation
followed by fsync() is likely to work better for ffs and just work for
all fs's.
Bruce
More information about the freebsd-fs
mailing list