Re: posix_fallocate(2) performance

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Tue, 05 Aug 2025 21:53:40 UTC
On Tue, Aug 5, 2025 at 10:21 AM Roman Bogorodskiy <novel@freebsd.org> wrote:
>
> Hi,
>
> I've recently got a problem report is that creating fully allocated
> volumes in libvirt take a lot of time and resource consuming. Volume
> here is represented a single file on a filesystem.
>
> To allocate volumes, libvirt tries various methods, with
> posix_fallocate(2) being the first one to try, and manual writing of
> zeros being the last option.
>
> I noticed that posix_fallocate(2) is indeed fairly slow. I've
> implemented a test application for a quick benchmark (attached; in case
> attachment doesn't go through, it's also available on github [1]).
>
> Results I get on FreeBSD 14.3-RELEASE amd64:
>
> $ time ./fallocate test.raw-2 posix_fallocate
> safezero_posix_fallocate()
> ./fallocate test.raw-2 posix_fallocate  0,00s user 12,77s system 9% cpu 2:12,21 total
> $ time ./fallocate test.raw-1 slow
> safezero_slow()
> ./fallocate test.raw-1 slow  0,02s user 10,32s system 8% cpu 2:03,66 total
> $
>
> Test files are stored on this parititon:
>
> /dev/ada0p2 on / (ufs, NFS exported, local, soft-updates, journaled soft-updates)
>
> In this run posix_fallocate(2) uses even more resources and runs longer
> than the writing zeros. It slightly differs from run to run, but
> generally these two ways give similar results.
If you look at vop_stdallocate(), you'll see it basically just writes zeros
(with some stuff for handling partial blocks).

Until someone writes a UFS specific ufs_allocate(), that's all you
are going to get.

Btw, it is basically impossible to do allocate for ZFS.
(Maybe the application should be "faked into believing it did this?".)

rick

>
> I've tried the same app on Ubuntu Linux:
>
> $ time ./fallocate test.raw-2 posix_fallocate
> safezero_posix_fallocate()
>
> real    0m0.023s
> user    0m0.000s
> sys     0m0.015s
> $ time ./fallocate test.raw-1 slow
> safezero_slow()
>
> real    0m47.262s
> user    0m0.017s
> sys     0m18.650s
> $
>
> Partition here is:
>
> /dev/sda2 on / type ext4 (rw,relatime,lazytime)
>
> Here, posix_fallocate() is way faster than manual zero'ying.
>
> This makes me wonder:
>
>  - Is posix_fallocate(2) supposed to have similar performance as manual
>    writing of zeros?
>  - Are there recommendations for UFS tuning to make this type of
>    operations faster?
>  - Are there maybe better ways to fully allocate a file rather than
>    calling posix_fallocate(2)?
>
> Thanks,
> Roman
>
>
> 1: https://gist.githubusercontent.com/novel/d5b1fdb54256a9f16f1ee454dd984aaa/raw/22017a3c23658230fdb689967ba54a79d9c9a00b/fallocate.c
>