readdir/telldir/seekdir problem (i think)

Julian Elischer julian at freebsd.org
Tue Apr 28 03:09:38 UTC 2015


On 4/28/15 4:03 AM, Rick Macklem wrote:
> Julian Elischer wrote:
>> On 4/25/15 4:28 AM, John Baldwin wrote:
>>> On Saturday, April 25, 2015 02:36:24 AM Julian Elischer wrote:
>>>> On 4/25/15 1:30 AM, Julian Elischer wrote:
>>>>> On 4/24/15 10:59 PM, John Baldwin wrote:
>>>>>> Index: head/lib/libc/gen/telldir.c
>>>>>> ===================================================================
>>>>>> --- head/lib/libc/gen/telldir.c (revision 281929)
>>>>>> +++ head/lib/libc/gen/telldir.c (working copy)
>>>>>> @@ -101,8 +101,10 @@
>>>>>>                    return;
>>>>>>            if (lp->loc_loc == dirp->dd_loc && lp->loc_seek ==
>>>>>> dirp->dd_seek)
>>>>>>                    return;
>>>>>> -       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
>>>>>> SEEK_SET);
>>>>>> -       dirp->dd_seek = lp->loc_seek;
>>>>>> +       if (lp->loc_seek != dirp->dd_seek) {
>>>>>> +               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
>>>>>> SEEK_SET);
>>>>>> +               dirp->dd_seek = lp->loc_seek;
>>>>>> +       }
>>>>> yes I did that yesterday but it still fails when you transition
>>>>> blocks.. (badly).
>>>>>
>>>>> I also tried bigger blocks.. also fails (eventually)
>>>>>
>>>>> I did find a way to make it work...  you had to seek back
>>>>> to the first block you deleted on each set..
>>>>> then work forward from there again..  unfortunately since
>>>>> I'm trying to make a microsoft program not fail (via samba)
>>>>> I have no control over how it does things and seekdir doesn't
>>>>> know what was deleted anyway... (so the fix is fine for  the
>>>>> test program but not for real life)
>>>>>
>>>>> I think I can make the BSD one act like the linux one by changing
>>>>> the lseek being done to use the offset (loc) plus the buffer seek
>>>>> address of the target, instead of just going for the buffer base
>>>>> and
>>>>> stepping forward through the entries..
>>>>>
>>>>> maybe tomorrow.
>>>>>
>>>> The following conditional code makes ours behave the same as the
>>>> linux
>>>> one.
>>>> it breaks several 'rules' but works where ours is clean but
>>>> fails..
>>>> as Rick said..  "maybe that's what we should do too."
>>>>
>>>>
>>>> this is at the end of seekdir()
>>>>
>>>>
>>>> The new code does what linux does.. and shouldn't work.. but does
>>>>                // at least in the limited conditions I need it to.
>>>>                // We'll probably need to do this at work...:
>>>>
>>>>
>>>> The original code is what we have now, but gets mightily confused
>>>> sometimes.
>>>>           // This is clean(er) but fails in specific
>>>>           situations(when
>>>> doing commands
>>>>           // from Microft windows, via samba).
>>>>
>>>>
>>>> root at vps1:/tmp # diff -u dir.c.orig dir.c
>>>> --- dir.c.orig    2015-04-24 11:29:36.855317000 -0700
>>>> +++ dir.c    2015-04-24 11:15:49.058500000 -0700
>>>> @@ -1105,6 +1105,13 @@
>>>>             dirp->dd_loc = lp->loc_loc;
>>>>             return;
>>>>         }
>>>> +#ifdef GLIBC_SEEK
>>>> +    (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc,
>>>> SEEK_SET);
>>>> +    dirp->dd_seek = lp->loc_seek + lp->loc_loc;
>>>> +    dirp->dd_loc = 0;
>>>> +    lp->loc_seek = dirp->dd_seek;
>>>> +    lp->loc_loc = 0;
>>>> +#else
>>>>         (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
>>>>         dirp->dd_seek = lp->loc_seek;
>>>>         dirp->dd_loc = 0;
>>>> @@ -1114,6 +1121,7 @@
>>>>             if (dp == NULL)
>>>>                 break;
>>>>         }
>>>> +#endif
>>>>     }
>>> Yes, this isn't at all safe.  There's no guarantee whatsoever that
>>> the offset on the directory fd that isn't something returned by
>>> getdirentries has any meaning.  In particular, the size of the
>>> directory entry in a random filesystem might be a different size
>>> than the structure returned by getdirentries (since it converts
>>> things into a FS-independent format).
>>>
>>> This might work for UFS by accident, but this is probably why ZFS
>>> doesn't work.
>>>
>>> However, this might be properly fixed by the thing that ino64 is
>>> doing where each directory entry returned by getdirentries gives
>>> you a seek offset that you _can_ directly seek to (as opposed to
>>> seeking to the start of the block and then walking forward N
>>> entries until you get an inter-block entry that is the same).
>> I just made the stunning discovery that our seekdir/readdir/telldir
>> code in libc works with
>> FreeBSD 8.0.
>> so maybe the problem is that the kernel changed it's behaviour, and
>> no-one thought to fix libc..
>>
>> (at least it works on one of our 8.0 base appliances.. I'll do more
>> testing tomorrow.. it's past midnight.)

actually that was a mistake.. it fails on 8.0 as much as it fails on 
10.x and 11.
the patch above also fixes it on 8.0.  (UFS and ZFS)


> I suspect that pre-r252438 systems work better for UFS than r252438
> or later. That patch changed ufs_readdir() so that it no longer returned
> the on-disk directory structure. (Among other things, it added code that
> skipped over d_ino == 0 entries.)
yes but it was broken even before that.
basically here's the difference between what Linux (and mu patched 
code) does and what we do..

in Linux they seek directly to the base address of the block PLUS the 
offset of the entry being seeked.
The filesystem somehow correctly interprets this as teh start of the 
correct entry.
I guess it must return the correct information in the dirent structure 
in the first place.

in the unmodified BSD version, we seek to the start of the 'block' 
(actually the start of the buffer we asked for, which may or may not 
(probably not) be any special address in the directory (except at 0) 
and then we step forward records adding up the offsets (lengths) until 
we get pas thte address we are looking for.  Since we do not return 
empty records, as soon as we get to the place where there ued to be an 
empty record, we get out of sync with the kernel. It appears that even 
in 8, deleted records didn't get to userland. The answer would be to 
pad out earlier records or add pad records at the front of the buffer, 
but that doesn't seem to be happenning.  (it's what USED to be done 
but I suspect that 254438 was not the only place that was broken.)





>
> As such, r252438 and later systems have UFS where the "logical" offset
> of a directory entry returned by getdirentries() isn't the same as the
> "physical" offset for it in the on-disk directory.
>
> Having said the above, I have two somewhat inconsistent thoughts:
> 1 - As jhb has explained, the libc functions aren't safe for telldir()/seekdir()
>      when entries are added/deleted. It just happens that UFS might work
>      ok (and is more likely to work ok when "logical offset" == "physical offset").
> 2 - I'm not sure r252438 was a good idea (at least the part that skips invalid
>      d_ino == 0 entries) because I don't think making "logical offset" != "physical offset"
>      is a good idea, if there isn't a good reason to need to do so.
>
> I think it is hard to argue that r252438 broke the libc functions. It just
> happens that cases that aren't guaranteed to work happens to work without r252438.
>
> I also think that the use of d_off (or d_cookie, if you prefer that name), which
> would be the "physical offset" of the next directory entry is the best bet
> for fixing this, in general. (By in general, I mean for all file systems.)
> But this will require a new getdirentries(2) syscall and libc functions that
> know how to use it.
>
> rick
>
>>
>> _______________________________________________
>> freebsd-current at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to
>> "freebsd-current-unsubscribe at freebsd.org"
>>
>



More information about the freebsd-current mailing list