Re: Sparse file support in FreeBSD NFSv4.2 server

In reply to: Lionel Cons : "Re: Sparse file support in FreeBSD NFSv4.2 server"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Alan Somers <asomers_at_freebsd.org>
Date: Tue, 13 May 2025 16:32:48 UTC
On Tue, May 13, 2025 at 9:29 AM Lionel Cons <lionelcons1972@gmail.com>
wrote:

> On Tue, 13 May 2025 at 10:53, Rob Norris <robn@despairlabs.com> wrote:
> >
> > On Tue, 13 May 2025, at 6:35 PM, Aurélien Couderc wrote:
> > > On Tue, May 13, 2025 at 8:51 AM Rob Norris <robn@despairlabs.com>
> wrote:
> > > > Without having looked at it, I can see a way to do it by creating
> some object-specific operation to "write" but have it accounted to a
> dataset's ""reservation", rather than "used". Easy to say, difficult to do.
> I suspect the hardest part is figuring out the best way to keep a set of
> reserved ranges on each object.
> > >
> > > What about just writing 0x00 bytes in case of F_ALLOCSP?
> >
> > By default, with block compression enabled, OpenZFS will detect all-zero
> blocks and write a hole instead, which doesn't use any space.
>
> A "sparse file hole"? That would BREAK databases and lots of
> scientific software.
> There is a distinctive difference between a "sparse file hole", which
> represents a range of "no data here", and a data range, which might
> contain lots of 0x00 bytes, but represent VALID DATA. That is a
> difference!
>

Rob is correct.  ZFS really does work that way.  And it's fine with most
databases, because syscalls like read(2) will still return zeroes.


>
> For example, for our data runs we create a sparse file with 310PiB
> (Pebibyte), 10 Pebibyte per day of the current month. If we collect
> data, the data will be written, filling part of the hole. But if there
> were no data collected, that part of that file will remain a hole, and
> indicate "no data (collected)"
>

If you need to preserve a distinction between "no data" and "all zeroes",
then you must set ZFS's compression property to "none".  However, that's
not an absolute guarantee.  For example, if you do an unaligned write of
zeros to a region that's a hole, ZFS may synthesize extra zeros in order to
pad the write out to a whole record.  So afterwards SEEK_HOLE/SEEK_DATA
would indicate that a region you never wrote is actually a dense region
full of zeroes.  So if you really absolutely need to know with perfect
accuracy the distinction between holes and zero-regions, you'll have to
track that information outside of the file system.  You'll save space doing
that, too, since you'll be able to enable ZFS compression.


>
> > But even if it did, that can't guarantee enough space to be able to
> overwrite it, because the previous data may exist in snapshots or clones.
> There's also metadata and indirect blocks and other stuff that we need
> space for too.
>
> I think Aurélien was thinking about "sparse file" as use case here,
> not "deduplication" or "safe erasure of data"
>
> > Reservations already have all the stuff needed to track the extra
> commitment, which is why I think if it were possible, that's the way to do
> it.
>
> F_ALLOCSP *is* a reservation
>

Rob meant ZFS's "reservation" property specifically.  As in, "The ZFS
reservation property could theoretically be used to implement an F_ALLOCSP
reservation".  Theoretically, but I doubt that anybody will ever implement
it.  There are just too many special cases to handle, as others have
already described in this thread.

-Alan