seekdir/readdir patch .. Call for Review.

Julian Elischer julian at freebsd.org
Mon May 4 14:53:02 UTC 2015


On 5/3/15 10:33 PM, Jilles Tjoelker wrote:
> On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov wrote:
>> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote:
>>> if you are interested in readdir(3), seekdir(3) and telldir(3) then
>>> you should look at
>>>     https://reviews.freebsd.org/D2410
>>> this patches around a problem in seekdir() that breaks Samba.
>>> Seekdir(3) will not work as expected when files prior to the point of
>>> interest in directory have been deleted since the directory was opened.
>>> Windows clients using Samba cause both these things to happen, causing
>>> the next readdir(3) after the bad seekdir(3) to skip some entries and
>>> return the wrong file.
>>> Samba only needs to step back a single directory entry in the case
>>> where it reads an entry and then discovers it can't fit it into the
>>> buffer it is sending to the windows client. It turns out we can
>>> reliably cater to Samba's requirement because the "last returned
>>> element" is always still in memory, so with a little care, we can
>>> set our filepointer back to it safely. (once)
>>> seekdir and readdir (and telldir()) need a complete rewrite along with
>>> getdirentries() but that is more than a small edit like this.
>> Can you explain your expectations from the whole readdir() vs. parallel
>> directory modifications interaction ?  From what I understood so far,
>> there is unlocked modification of the container and parallel iterator
>> over the same container.  IMO, in such situation, whatever tweaks you
>> apply to the iterator, it is still cannot be made reliable.
>> Before making single-purpose changes to the libc readdir and seekdir
>> code, or to the kernel code, it would be useful to state exact behaviour
>> of the dirent machinery we want to see. No, 'make samba works in my
>> situation' does not sound good enough.
> Consider the subsequence of entries that existed at opendir() time and
> were not removed until now. This subsequence is clearly defined and does
> not have concurrency problems. The order of this subsequence must remain
> unchanged and seekdir() must be correct with respect to this
> subsequence.
>
> Additionally, two other kinds of entries may be returned. New entries
> may be inserted anywhere in between the entries of the subsequence, and
> removed entries may be returned as if they were still part of the
> subsequence (so that not every readdir() needs a system call).
>
> A simple implementation for UFS-style directories is to store the offset
> in the directory (all bits of it, not masking off the lower 9 bits).
> This needs d_off or similar in struct dirent. The kernel getdirentries()
> then needs a similar loop as the old libc seekdir() to go from the start
> of the 512-byte directory block to the desired entry (since an entry may
> not exist at the stored offset within the directory block).
>
> This means that a UFS-style directory cannot be compacted (existing
> entries moved from higher to lower offsets to fill holes) while it is
> open for reading. An NFS exported directory is always open for reading.
>
> This also means that duplicate entries can only be returned if that
> particular filename was deleted and created again.
>
> Without kernel support, it is hard to get telldir/seekdir completely
> reliable. The current libc implementation is wrong since the "holes"
> within the block just disappear and change the offsets of the following
> entries; the kernel cannot fix this using entries with d_fileno = 0
> since it cannot know, in the general case, how long the deleted entry
> was in the filesystem-independent dirent format. My previous idea of
> storing one d_fileno during telldir() is wrong since it will fail if
> that entry is deleted.
>
> If you do not care about memory usage (which probably is already
> excessive with the current libc implementation), you could store at
> telldir() time the offset of the current block returned by
> getdirentries() and the d_fileno of all entries already returned in the
> current block.
>
> The D2410 patch can conceptually work for what Samba needs, stepping
> back one directory entry. I will comment on it.
>
how long do I have to wait until I can commit  this and was kib's 
comment a
"do not commit"?



More information about the freebsd-current mailing list