SEEK_HOLE at EOF

From: Alan Somers <asomers_at_freebsd.org>
Date: Thu, 04 Apr 2024 18:14:45 UTC
tldr; there are two problems:
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO

Details:

According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset.  Also, each file
has a zero-sized virtual hole at the very end of the file.  So I would
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF.  Those two statements seem contradictory to
me.  The first behavior seems more logical.  I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.

What does the spec say?

There is no POSIX standard for this.  It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at EOF.
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file".  That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF.  Only
beyond EOF should it return ENXIO.

But what do other implementations do?

Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
returns ENXIO at EOF on most file systems.  I tested a number of file
systems on both FreeBSD and Linux.  Most of them return ENXIO.  The
only two outliers are FreeBSD's tmpfs and Linux's NFS client.

                FreeBSD   Linux
======= ========= =====
UFS     ENXIO
ZFS     ENXIO
tmpfs   file size ENXIO
msdosfs ENXIO     ENXIO
ext2fs  ENXIO     ENXIO
xfs               ENXIO
tarfs   ENXIO
nfs     ENXIO     file size

So what should we change?  Clearly, it's bad for tmpfs to be
inconsistent.  My preference would be for everything to behave like
tmpfs, but it's currently losing the popularity contest.  Anybody else
have thoughts?

-Alan