Re: SEEK_HOLE at EOF

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Thu, 04 Apr 2024 20:56:31 UTC
On Thu, Apr 4, 2024 at 11:15 AM Alan Somers <asomers@freebsd.org> wrote:
>
> tldr; there are two problems:
> 1) tmpfs handles SEEK_HOLE differently than other file systems
> 2) everything else handles SEEK_HOLE at EOF poorly, IMHO
>
> Details:
>
> According to lseek(2), SEEK_HOLE should return the start of the next
> hole greater than or equal to the supplied offset.  Also, each file
> has a zero-sized virtual hole at the very end of the file.  So I would
> expect that calling SEEK_HOLE at EOF would return the file's size.
> However, the man page also says that SEEK_HOLE will return ENXIO when
> the offset points to EOF.  Those two statements seem contradictory to
> me.  The first behavior seems more logical.  I would expect SEEK_HOLE
> to work the same way both at EOF and at any other file offset.
>
> What does the spec say?
>
> There is no POSIX standard for this.  It was invented by Solaris,
> Illumos's man page does not say clearly say what should happen at EOF.
> Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
> offset is beyond the end of the file".  That would seem to indicate
> behavior 1: SEEK_HOLE should return the file's size at EOF.  Only
> beyond EOF should it return ENXIO.
Well, there is the Austin Group stuff (never ratified by POSIX as I
understand it).

Here's what it says about SEEK_HOLE and offset:
If whence is SEEK_HOLE, the file offset shall be set to the smallest
location of a byte within a hole and not less than offset, except that
if offset falls within the last hole, then the file offset may be set
to the file size instead. It shall be an error if offset is greater
or equal to the size of the file.

I'd suggest we follow this, since it is the closest to a standard that there is.

rick
>
> But what do other implementations do?
>
> Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
> returns ENXIO at EOF on most file systems.  I tested a number of file
> systems on both FreeBSD and Linux.  Most of them return ENXIO.  The
> only two outliers are FreeBSD's tmpfs and Linux's NFS client.
>
>                 FreeBSD   Linux
> ======= ========= =====
> UFS     ENXIO
> ZFS     ENXIO
> tmpfs   file size ENXIO
> msdosfs ENXIO     ENXIO
> ext2fs  ENXIO     ENXIO
> xfs               ENXIO
> tarfs   ENXIO
> nfs     ENXIO     file size
>
> So what should we change?  Clearly, it's bad for tmpfs to be
> inconsistent.  My preference would be for everything to behave like
> tmpfs, but it's currently losing the popularity contest.  Anybody else
> have thoughts?
>
> -Alan
>