seekdir/readdir patch .. Call for Review.

Rick Macklem rmacklem at uoguelph.ca
Tue May 5 11:04:47 UTC 2015


Julian Elischer wrote:
> On 5/5/15 8:42 AM, Rick Macklem wrote:
> > Julian Elischer wrote:
> >> On 5/3/15 10:33 PM, Jilles Tjoelker wrote:
> >>> On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov
> >>> wrote:
> >>>> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote:
> >>>>> if you are interested in readdir(3), seekdir(3) and telldir(3)
> >>>>> then
> >>>>> you should look at
> >>>>>      https://reviews.freebsd.org/D2410
> >>>>> this patches around a problem in seekdir() that breaks Samba.
> >>>>> Seekdir(3) will not work as expected when files prior to the
> >>>>> point of
> >>>>> interest in directory have been deleted since the directory was
> >>>>> opened.
> >>>>> Windows clients using Samba cause both these things to happen,
> >>>>> causing
> >>>>> the next readdir(3) after the bad seekdir(3) to skip some
> >>>>> entries
> >>>>> and
> >>>>> return the wrong file.
> >>>>> Samba only needs to step back a single directory entry in the
> >>>>> case
> >>>>> where it reads an entry and then discovers it can't fit it into
> >>>>> the
> >>>>> buffer it is sending to the windows client. It turns out we can
> >>>>> reliably cater to Samba's requirement because the "last
> >>>>> returned
> >>>>> element" is always still in memory, so with a little care, we
> >>>>> can
> >>>>> set our filepointer back to it safely. (once)
> >>>>> seekdir and readdir (and telldir()) need a complete rewrite
> >>>>> along
> >>>>> with
> >>>>> getdirentries() but that is more than a small edit like this.
> >>>> Can you explain your expectations from the whole readdir() vs.
> >>>> parallel
> >>>> directory modifications interaction ?  From what I understood so
> >>>> far,
> >>>> there is unlocked modification of the container and parallel
> >>>> iterator
> >>>> over the same container.  IMO, in such situation, whatever
> >>>> tweaks
> >>>> you
> >>>> apply to the iterator, it is still cannot be made reliable.
> >>>> Before making single-purpose changes to the libc readdir and
> >>>> seekdir
> >>>> code, or to the kernel code, it would be useful to state exact
> >>>> behaviour
> >>>> of the dirent machinery we want to see. No, 'make samba works in
> >>>> my
> >>>> situation' does not sound good enough.
> >>> Consider the subsequence of entries that existed at opendir()
> >>> time
> >>> and
> >>> were not removed until now. This subsequence is clearly defined
> >>> and
> >>> does
> >>> not have concurrency problems. The order of this subsequence must
> >>> remain
> >>> unchanged and seekdir() must be correct with respect to this
> >>> subsequence.
> >>>
> >>> Additionally, two other kinds of entries may be returned. New
> >>> entries
> >>> may be inserted anywhere in between the entries of the
> >>> subsequence,
> >>> and
> >>> removed entries may be returned as if they were still part of the
> >>> subsequence (so that not every readdir() needs a system call).
> >>>
> >>> A simple implementation for UFS-style directories is to store the
> >>> offset
> >>> in the directory (all bits of it, not masking off the lower 9
> >>> bits).
> >>> This needs d_off or similar in struct dirent. The kernel
> >>> getdirentries()
> >>> then needs a similar loop as the old libc seekdir() to go from
> >>> the
> >>> start
> >>> of the 512-byte directory block to the desired entry (since an
> >>> entry may
> >>> not exist at the stored offset within the directory block).
> >>>
> >>> This means that a UFS-style directory cannot be compacted
> >>> (existing
> >>> entries moved from higher to lower offsets to fill holes) while
> >>> it
> >>> is
> >>> open for reading. An NFS exported directory is always open for
> >>> reading.
> >>>
> >>> This also means that duplicate entries can only be returned if
> >>> that
> >>> particular filename was deleted and created again.
> >>>
> >>> Without kernel support, it is hard to get telldir/seekdir
> >>> completely
> >>> reliable. The current libc implementation is wrong since the
> >>> "holes"
> >>> within the block just disappear and change the offsets of the
> >>> following
> >>> entries; the kernel cannot fix this using entries with d_fileno =
> >>> 0
> >>> since it cannot know, in the general case, how long the deleted
> >>> entry
> >>> was in the filesystem-independent dirent format. My previous idea
> >>> of
> >>> storing one d_fileno during telldir() is wrong since it will fail
> >>> if
> >>> that entry is deleted.
> >>>
> >>> If you do not care about memory usage (which probably is already
> >>> excessive with the current libc implementation), you could store
> >>> at
> >>> telldir() time the offset of the current block returned by
> >>> getdirentries() and the d_fileno of all entries already returned
> >>> in
> >>> the
> >>> current block.
> >>>
> >>> The D2410 patch can conceptually work for what Samba needs,
> >>> stepping
> >>> back one directory entry. I will comment on it.
> >>>
> >> how long do I have to wait until I can commit  this and was kib's
> >> comment a
> >> "do not commit"?
> >>
> > What about the bug Jilles reports in D2410?
> > - He said you might fix the problem by having telldir move the
> > entry
> >    to the head of the list when it has a hit. However, this means
> >    that
> >    an "old" entry gets modified. Is it possible for this "modified"
> >    entry to be a match and get modified again, and so on...?
> >
> > I will comment on the patch once you decide how to deal with Jilles
> > bug.
> I don't think is is a "bug" as such..
> it wasn't a case I was looking to fix and it is just as it was
> before.
> I'd rephrase it as: "Jilles asks that we also fix the case where the
> previous telldir response is
> a recycling of an old  telldir response."
> 
For the case where it hits a matching entry, the entry at the head of
the list is some older entry.
Your fixtelldir() changes that older entry. Is that what you want?

> In actual fact this scenario will almost never happen.
> because the previous time the telldir call returned that location,
> the 'fixtelldir'  function would have later been called on the
> result,
> which
> would have been modified to point to the start of the NEXT buffer,
> and as such would not get matched on the next telldir to the same
> place,  as that would be looking for the first location after the end
> of the previous buffer. OR it does match because we seeked back to
> that location, o which case we will get the same buffer alredy fixed.
If at this point, it fixes it again by replacing the values with the
same ones, then I guess it may be ok.
I'll have to take another look at this.

I may remove myself from the review, since I'll admit I don't understand
the implications of this patch well enough. I'm looking for weird cases
where it would cause a regression, because to me, if it doesn't break
anything not already broken, I don't have a problem with it. I will
comment in phabricator soon (maybe to-night).

rick
ps: This should probably go in the phabricator comments if it isn't already
    there.
> The two addresses are logically equivalent,
> but you can only know that after you have loaded the next buffer.
> 
> i.e. the first time telldir is called on location X it returns
>      seek=123, location= 4096 (one past end of buffer)
>   then a read happens and it gets converted to:
>      seek= 124 location=0  (beginning of next buffer..
> then some more reads happen or so and we
>     seek back to 124,0
> and do a new telldir, in which case we get a 'pre-fixed' value of
>       124,0 not 123,4096.
>   so while fixtelldir is not called, it doesn't matter.
> 
> If we seek back FURTHER than 124,0 then we are into the "unreliable
> unfixed  zone".
>   Assuming that by some luck we don't get confused (there were no
> deletes) and we then work forwards back to 123,4096 and do a telldir
> again,
> we will NOT match the old one, which was changed to be 124,0
> so we will allocate a new one at 123,4096,
> which in turn will get 'fixed' to be 124,0 So it worked.  the only
> 'less
> than optimal' thing here is that we now have two entries saying
> 124,0.
> But the case is so unusual and in a totally unsupported mode of
> operation, that as far as I know no-one uses, that adding code
> to merge the two entries is just not worth it. It still returns the
> correct values, just wastes some memory. Anyone who wants to
> do a DOS wasting these 16 bytes of memory, has to do it via writing
> code to do it. so.. why would one DOS oneself?
> 
> we COULD mode the item up front but it'd never get matched unless
> there had not been a matching read following the first telldir which
> is really unlikely.
> 
> > rick
> >
> >> _______________________________________________
> >> freebsd-current at freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to
> >> "freebsd-current-unsubscribe at freebsd.org"
> >>
> >
> 
> 


More information about the freebsd-current mailing list