Inconsistency between lseek(SEEK_HOLE) and lseek(SEEK_DATA)

Maxim Sobolev sobomax at FreeBSD.org
Tue Feb 2 05:17:03 UTC 2016


WRT the:

> There is no 'hole-only' files on UFS, the last byte in the UFS file must
> be populated, either by allocated fragment if the last byte is in the
> direct blocks range, or by the full block if in the indirect range.

Ideed, the UFS resists putting a hole at the end of the file, yet, it's
possible to arrange hole-only situation by first truncating an empty file
to some size that is greater than the target hole size, so that you get
hole of the desired size following by the bit of data, and then truncating
the resulting file back to the offset where the data starts:

-----
    fd = open(fname, O_WRONLY | O_CREAT | O_TRUNC, DEFFILEMODE);
    if (fd == -1) {
        exit (1);
    }
    if (ftruncate(fd, 1024 * 128) < 0) {
        exit (1);
    }
    data = lseek(fd, 0, SEEK_DATA);
    if (data >= 0 && ftruncate(fd, data) < 0) {
        exit (1);
    }
-----
[sobomax at rtpdev ~/projects/freebsd11/usr.bin/lsholes]$ ./lsholes
/tmp/temp.MgoPPo
Type Start   End  Size
HOLE     0 98303 98304

Total HOLE: 98304 (100.00%)
Total DATA: 0 (0.00%)
[sobomax at rtpdev ~/projects/freebsd11/usr.bin/lsholes]$ ls -l
/tmp/temp.MgoPPo
-rw-r--r--  1 sobomax  wheel  98304 Feb  1 21:06 /tmp/temp.MgoPPo
-----

I don't know if operating on that file would result in some data
corruption, but I also seem have no issues creating hole-only files on ZFS
using my fallocate(2) syscall.

-Max

On Mon, Feb 1, 2016 at 12:14 PM, Maxim Sobolev <sobomax at freebsd.org> wrote:

> Yeah, I've noticed that text now. It looks a lot like the sentence has
> been copied around and some part of it had lost in transition. In any case
> here is a small manpage patch to make a "vurtual hole" more pronounced and
> also explain how it affects return value of the syscall.
>
> https://reviews.freebsd.org/D5162
>
> On Mon, Feb 1, 2016 at 11:40 AM, Konstantin Belousov <kostikbel at gmail.com>
> wrote:
>
>> On Mon, Feb 01, 2016 at 11:22:18AM -0800, Maxim Sobolev wrote:
>> > Well, it's still seems to be quite obscure. At the very least, the
>> lseek(2)
>> > manual page needs to reflect that. Right now it says:
>> >
>> > ERRORS
>> > [...]
>> >      [ENXIO]            For SEEK_DATA, there are no more data regions
>> past
>> > the
>> >                         supplied offset.  For SEEK_HOLE, there are no
>> more
>> >                         holes past the supplied offset.
>> >
>> > Which is not true, the SEEK_HOLE would return st_size when there are no
>> > more holes past the supplied offset, not ENXIO. It is also interesting
>> that
>> > somehow empty file is a special case as well. Both SEEK_HOLE and
>> SEEK_DATA
>> > return -1 on those. Anybody who programs to that document would probably
>> > get as confused as myself.
>> >
>> > However, having said that, our cousin Linux behaves the same - i.e.
>> returns
>> > EOF+1 on SEEK_HOLE and -1 on SEEK_DATA, and does the same for empty
>> files,
>> > so at least we are consistent with that.
>>
>> Actually, since you referred to the man page for lseek(2), which seems to
>> be copied from the Solaris man page:
>> ...
>> The existence of a hole at the end of every data region allows for easy
>> programming and implies that a virtual hole exists at the end of the
>> file.
>> ...
>>
>> And, the text you quoted, does not imply that the call must return ENXIO
>> at the EOF for hole.  It only allows the call to do it, but other language
>> makes this unreasonable.
>>
>> Note that it is Solaris, not Linux, which implementation of the SEEK_HOLE
>> and SEEK_DATA is the arbitration sample for the behavior.  We got it with
>> the ZFS import.  Our UFS implementation, and whatever Linux does, are only
>> reimplementation without clean documentation, and were done by observing
>> ZFS behaviour.
>>
>>
>


More information about the freebsd-fs mailing list