Re: SEEK_HOLE at EOF

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Thu, 04 Apr 2024 21:38:43 UTC
On Thu, Apr 4, 2024 at 1:59 PM Alan Somers <asomers@freebsd.org> wrote:
>
> On Thu, Apr 4, 2024 at 2:56 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> >
> > On Thu, Apr 4, 2024 at 11:15 AM Alan Somers <asomers@freebsd.org> wrote:
> > >
> > > tldr; there are two problems:
> > > 1) tmpfs handles SEEK_HOLE differently than other file systems
> > > 2) everything else handles SEEK_HOLE at EOF poorly, IMHO
> > >
> > > Details:
> > >
> > > According to lseek(2), SEEK_HOLE should return the start of the next
> > > hole greater than or equal to the supplied offset.  Also, each file
> > > has a zero-sized virtual hole at the very end of the file.  So I would
> > > expect that calling SEEK_HOLE at EOF would return the file's size.
> > > However, the man page also says that SEEK_HOLE will return ENXIO when
> > > the offset points to EOF.  Those two statements seem contradictory to
> > > me.  The first behavior seems more logical.  I would expect SEEK_HOLE
> > > to work the same way both at EOF and at any other file offset.
> > >
> > > What does the spec say?
> > >
> > > There is no POSIX standard for this.  It was invented by Solaris,
> > > Illumos's man page does not say clearly say what should happen at EOF.
> > > Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
> > > offset is beyond the end of the file".  That would seem to indicate
> > > behavior 1: SEEK_HOLE should return the file's size at EOF.  Only
> > > beyond EOF should it return ENXIO.
> > Well, there is the Austin Group stuff (never ratified by POSIX as I
> > understand it).
> >
> > Here's what it says about SEEK_HOLE and offset:
> > If whence is SEEK_HOLE, the file offset shall be set to the smallest
> > location of a byte within a hole and not less than offset, except that
> > if offset falls within the last hole, then the file offset may be set
> > to the file size instead. It shall be an error if offset is greater
> > or equal to the size of the file.
> >
> > I'd suggest we follow this, since it is the closest to a standard that there is.
>
> That sounds like behavior 2: return ENXIO at EOF.  For reference, do
> you have a link to that somewhere?
0000415: add SEEK_HOLE, SEEK_DATA to lseek - Austin Group Defect
Tracker (austingroupbugs.net)
If this doesn't give you a link (gmail never shows the raw url for me)
just google
"SEEK_HOLE austin group".

rick