standards/179248: A return value of telldir(3) only seekable for once
Jilles Tjoelker
jilles at stack.nl
Tue Jun 11 21:40:07 UTC 2013
The following reply was made to PR standards/179248; it has been noted by GNATS.
From: Jilles Tjoelker <jilles at stack.nl>
To: Akinori MUSHA <knu at FreeBSD.org>
Cc: freebsd-gnats-submit at FreeBSD.org
Subject: Re: standards/179248: A return value of telldir(3) only seekable for
once
Date: Tue, 11 Jun 2013 23:29:53 +0200
On Mon, Jun 03, 2013 at 07:14:46AM +0000, Akinori MUSHA wrote:
> >Number: 179248
> >Category: standards
> >Synopsis: A return value of telldir(3) only seekable for once
> [snip]
> >Description:
> Our implementation of telldir(3)/seekdir(3) is not POSIX compliant in
> that a value obtained from telldir(3) is invalidated after calling
> seekdir(3) and then readdir(3).
> IEEE Std 1003.1, 2008/2013 says that only a call of rewinddir(3) may
> invalidate the location values returned by telldir(3):
> If the value of loc was not obtained from an earlier call to
> telldir(), or if a call to rewinddir() occurred between the call
> to telldir() and the call to seekdir(), the results of subsequent
> calls to readdir() are unspecified.
I think the problem is that telldir()/seekdir() want to return to the
same directory entry within the block, instead of to the beginning of
the block, while the required bits for the entry within the block are
not available in telldir()'s return value.
Some other platforms provide kernel support for this operation. The
struct dirent has a field d_off which is the file offset of the next
entry. It looks like the ino64 patches from Gleb Kurtsou add this
functionality to the FreeBSD kernel. With this, telldir() returns the
d_off value in the last dirent returned by readdir() (or 0) and
seekdir() simply calls lseek().
As a result, a telldir()/seekdir() sequence may set the directory
"backwards" a few entries even if it has been unmodified, because UFS
truncates the offset to a block boundary. This may require a network
filesystem to deny requests for a single directory entry at a time.
Alternatively, UFS may replace the truncated bits with the number of
directory entries to skip. This takes advantage of d_off being more like
a "cookie" than a true file offset.
The kernel may have a similar "out of bits" problem when an application
with 32-bit long calls getdirentries(2) on an NFSv3 directory which
returns 64-bit cookies, and also with unionfs and mount -o union.
In the case of unionfs, the kernel appears to use some sort of state in
the unionfs vnode and assumes that the directory cookies are otherwise
unique enough. This likely causes problems if lseek() is used with a
non-zero offset/cookie.
In the case of mount -o union, the kernel "solves" the problem by
irreversibly modifying the open file description to refer to the lower
layer after the upper layer's entries have been read; the only way to
deal with this in userland is to read the entire directory on a
duplicate open file description (created with open(fd, ".", ...)) on
opendir() and rewinddir() (bug: rewinddir() does not do this, violating
POSIX's requirement that rewinddir() pick up changes made to the
directory).
> >Fix:
> I don't have a quick fix for this, as it may need a revamp of how the
> location thing is defined.
> NetBSD seems to have a different implementation which doesn't have
> this problem.
> However, I'm not sure if theirs is flawless esp. wrt memory
> management.
NetBSD stores the (block, entry) pairs uniquely and for the life of the
DIR object (perhaps discarding them upon rewinddir() as well). This
means no memory is "leaked" per se but memory consumption on a DIR that
has many telldir() calls is proportional to the number of entries in the
directory.
Also, a "proper" solution is possible if you are willing to accept that
it does not work for all filesystems. Most filesystems leave some of the
bits zero (particularly if there are 64 of them) which can then be used
to store the entry number. However, a malloc-based solution is then
still necessary for filesystems that do need all the bits or very large
directories.
--
Jilles Tjoelker
More information about the freebsd-standards
mailing list