standards/179248: A return value of telldir(3) only seekable for once

Jilles Tjoelker jilles at stack.nl
Tue Jun 11 21:40:07 UTC 2013


The following reply was made to PR standards/179248; it has been noted by GNATS.

From: Jilles Tjoelker <jilles at stack.nl>
To: Akinori MUSHA <knu at FreeBSD.org>
Cc: freebsd-gnats-submit at FreeBSD.org
Subject: Re: standards/179248: A return value of telldir(3) only seekable for
 once
Date: Tue, 11 Jun 2013 23:29:53 +0200

 On Mon, Jun 03, 2013 at 07:14:46AM +0000, Akinori MUSHA wrote:
 > >Number:         179248
 > >Category:       standards
 > >Synopsis:       A return value of telldir(3) only seekable for once
 > [snip]
 > >Description:
 > Our implementation of telldir(3)/seekdir(3) is not POSIX compliant in
 > that a value obtained from telldir(3) is invalidated after calling
 > seekdir(3) and then readdir(3).
 
 > IEEE Std 1003.1, 2008/2013 says that only a call of rewinddir(3) may
 > invalidate the location values returned by telldir(3):
 
 >     If the value of loc was not obtained from an earlier call to
 >     telldir(), or if a call to rewinddir() occurred between the call
 >     to telldir() and the call to seekdir(), the results of subsequent
 >     calls to readdir() are unspecified.
 
 I think the problem is that telldir()/seekdir() want to return to the
 same directory entry within the block, instead of to the beginning of
 the block, while the required bits for the entry within the block are
 not available in telldir()'s return value.
 
 Some other platforms provide kernel support for this operation. The
 struct dirent has a field d_off which is the file offset of the next
 entry. It looks like the ino64 patches from Gleb Kurtsou add this
 functionality to the FreeBSD kernel. With this, telldir() returns the
 d_off value in the last dirent returned by readdir() (or 0) and
 seekdir() simply calls lseek().
 
 As a result, a telldir()/seekdir() sequence may set the directory
 "backwards" a few entries even if it has been unmodified, because UFS
 truncates the offset to a block boundary. This may require a network
 filesystem to deny requests for a single directory entry at a time.
 Alternatively, UFS may replace the truncated bits with the number of
 directory entries to skip. This takes advantage of d_off being more like
 a "cookie" than a true file offset.
 
 The kernel may have a similar "out of bits" problem when an application
 with 32-bit long calls getdirentries(2) on an NFSv3 directory which
 returns 64-bit cookies, and also with unionfs and mount -o union.
 
 In the case of unionfs, the kernel appears to use some sort of state in
 the unionfs vnode and assumes that the directory cookies are otherwise
 unique enough. This likely causes problems if lseek() is used with a
 non-zero offset/cookie.
 
 In the case of mount -o union, the kernel "solves" the problem by
 irreversibly modifying the open file description to refer to the lower
 layer after the upper layer's entries have been read; the only way to
 deal with this in userland is to read the entire directory on a
 duplicate open file description (created with open(fd, ".", ...)) on
 opendir() and rewinddir() (bug: rewinddir() does not do this, violating
 POSIX's requirement that rewinddir() pick up changes made to the
 directory).
 
 > >Fix:
 > I don't have a quick fix for this, as it may need a revamp of how the
 > location thing is defined.
 
 > NetBSD seems to have a different implementation which doesn't have
 > this problem.
 > However, I'm not sure if theirs is flawless esp. wrt memory
 > management.
 
 NetBSD stores the (block, entry) pairs uniquely and for the life of the
 DIR object (perhaps discarding them upon rewinddir() as well). This
 means no memory is "leaked" per se but memory consumption on a DIR that
 has many telldir() calls is proportional to the number of entries in the
 directory.
 
 Also, a "proper" solution is possible if you are willing to accept that
 it does not work for all filesystems. Most filesystems leave some of the
 bits zero (particularly if there are 64 of them) which can then be used
 to store the entry number. However, a malloc-based solution is then
 still necessary for filesystems that do need all the bits or very large
 directories.
 
 -- 
 Jilles Tjoelker


More information about the freebsd-standards mailing list