seekdir/readdir patch .. Call for Review.

Julian Elischer julian at freebsd.org
Tue May 5 02:36:45 UTC 2015


On 5/5/15 8:42 AM, Rick Macklem wrote:
> Julian Elischer wrote:
>> On 5/3/15 10:33 PM, Jilles Tjoelker wrote:
>>> On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov
>>> wrote:
>>>> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote:
>>>>> if you are interested in readdir(3), seekdir(3) and telldir(3)
>>>>> then
>>>>> you should look at
>>>>>      https://reviews.freebsd.org/D2410
>>>>> this patches around a problem in seekdir() that breaks Samba.
>>>>> Seekdir(3) will not work as expected when files prior to the
>>>>> point of
>>>>> interest in directory have been deleted since the directory was
>>>>> opened.
>>>>> Windows clients using Samba cause both these things to happen,
>>>>> causing
>>>>> the next readdir(3) after the bad seekdir(3) to skip some entries
>>>>> and
>>>>> return the wrong file.
>>>>> Samba only needs to step back a single directory entry in the
>>>>> case
>>>>> where it reads an entry and then discovers it can't fit it into
>>>>> the
>>>>> buffer it is sending to the windows client. It turns out we can
>>>>> reliably cater to Samba's requirement because the "last returned
>>>>> element" is always still in memory, so with a little care, we can
>>>>> set our filepointer back to it safely. (once)
>>>>> seekdir and readdir (and telldir()) need a complete rewrite along
>>>>> with
>>>>> getdirentries() but that is more than a small edit like this.
>>>> Can you explain your expectations from the whole readdir() vs.
>>>> parallel
>>>> directory modifications interaction ?  From what I understood so
>>>> far,
>>>> there is unlocked modification of the container and parallel
>>>> iterator
>>>> over the same container.  IMO, in such situation, whatever tweaks
>>>> you
>>>> apply to the iterator, it is still cannot be made reliable.
>>>> Before making single-purpose changes to the libc readdir and
>>>> seekdir
>>>> code, or to the kernel code, it would be useful to state exact
>>>> behaviour
>>>> of the dirent machinery we want to see. No, 'make samba works in
>>>> my
>>>> situation' does not sound good enough.
>>> Consider the subsequence of entries that existed at opendir() time
>>> and
>>> were not removed until now. This subsequence is clearly defined and
>>> does
>>> not have concurrency problems. The order of this subsequence must
>>> remain
>>> unchanged and seekdir() must be correct with respect to this
>>> subsequence.
>>>
>>> Additionally, two other kinds of entries may be returned. New
>>> entries
>>> may be inserted anywhere in between the entries of the subsequence,
>>> and
>>> removed entries may be returned as if they were still part of the
>>> subsequence (so that not every readdir() needs a system call).
>>>
>>> A simple implementation for UFS-style directories is to store the
>>> offset
>>> in the directory (all bits of it, not masking off the lower 9
>>> bits).
>>> This needs d_off or similar in struct dirent. The kernel
>>> getdirentries()
>>> then needs a similar loop as the old libc seekdir() to go from the
>>> start
>>> of the 512-byte directory block to the desired entry (since an
>>> entry may
>>> not exist at the stored offset within the directory block).
>>>
>>> This means that a UFS-style directory cannot be compacted (existing
>>> entries moved from higher to lower offsets to fill holes) while it
>>> is
>>> open for reading. An NFS exported directory is always open for
>>> reading.
>>>
>>> This also means that duplicate entries can only be returned if that
>>> particular filename was deleted and created again.
>>>
>>> Without kernel support, it is hard to get telldir/seekdir
>>> completely
>>> reliable. The current libc implementation is wrong since the
>>> "holes"
>>> within the block just disappear and change the offsets of the
>>> following
>>> entries; the kernel cannot fix this using entries with d_fileno = 0
>>> since it cannot know, in the general case, how long the deleted
>>> entry
>>> was in the filesystem-independent dirent format. My previous idea
>>> of
>>> storing one d_fileno during telldir() is wrong since it will fail
>>> if
>>> that entry is deleted.
>>>
>>> If you do not care about memory usage (which probably is already
>>> excessive with the current libc implementation), you could store at
>>> telldir() time the offset of the current block returned by
>>> getdirentries() and the d_fileno of all entries already returned in
>>> the
>>> current block.
>>>
>>> The D2410 patch can conceptually work for what Samba needs,
>>> stepping
>>> back one directory entry. I will comment on it.
>>>
>> how long do I have to wait until I can commit  this and was kib's
>> comment a
>> "do not commit"?
>>
> What about the bug Jilles reports in D2410?
> - He said you might fix the problem by having telldir move the entry
>    to the head of the list when it has a hit. However, this means that
>    an "old" entry gets modified. Is it possible for this "modified"
>    entry to be a match and get modified again, and so on...?
>
> I will comment on the patch once you decide how to deal with Jilles
> bug.
I don't think is is a "bug" as such..
it wasn't a case I was looking to fix and it is just as it was before.
I'd rephrase it as: "Jilles asks that we also fix the case where the 
previous telldir response is
a recycling of an old  telldir response."

In actual fact this scenario will almost never happen.
because the previous time the telldir call returned that location,
the 'fixtelldir'  function would have later been called on the result, 
which
would have been modified to point to the start of the NEXT buffer,
and as such would not get matched on the next telldir to the same
place,  as that would be looking for the first location after the end
of the previous buffer. OR it does match because we seeked back to
that location, o which case we will get the same buffer alredy fixed.
The two addresses are logically equivalent,
but you can only know that after you have loaded the next buffer.

i.e. the first time telldir is called on location X it returns
     seek=123, location= 4096 (one past end of buffer)
  then a read happens and it gets converted to:
     seek= 124 location=0  (beginning of next buffer..
then some more reads happen or so and we
    seek back to 124,0
and do a new telldir, in which case we get a 'pre-fixed' value of
      124,0 not 123,4096.
  so while fixtelldir is not called, it doesn't matter.

If we seek back FURTHER than 124,0 then we are into the "unreliable
unfixed  zone".
  Assuming that by some luck we don't get confused (there were no
deletes) and we then work forwards back to 123,4096 and do a telldir 
again,
we will NOT match the old one, which was changed to be 124,0
so we will allocate a new one at 123,4096,
which in turn will get 'fixed' to be 124,0 So it worked.  the only 'less
than optimal' thing here is that we now have two entries saying 124,0.
But the case is so unusual and in a totally unsupported mode of
operation, that as far as I know no-one uses, that adding code
to merge the two entries is just not worth it. It still returns the
correct values, just wastes some memory. Anyone who wants to
do a DOS wasting these 16 bytes of memory, has to do it via writing
code to do it. so.. why would one DOS oneself?

we COULD mode the item up front but it'd never get matched unless
there had not been a matching read following the first telldir which
is really unlikely.

> rick
>
>> _______________________________________________
>> freebsd-current at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to
>> "freebsd-current-unsubscribe at freebsd.org"
>>
>



More information about the freebsd-current mailing list