RFC: patch to make d_fileno 64bits

Sat Nov 22 15:34:34 UTC 2014

On Fri, Nov 21, 2014 at 06:45:52PM -0500, Rick Macklem wrote:
> Kostik wrote:
> > On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote:
> > > The attached patch covers the basics of a way to
> > > convert the d_fileno field of "struct dirent" to
> > > 64bits. This patch is incomplete and won't even
> > > build, but I thought I'd post it in case anyone
> > > wanted to take a look and comment on the approach
> > > it uses.
> > > 
> > > - renames the old/current one "struct dirent32"
> > > - changes d_fileno to 64bits and adds a 64bit
> > >   d_off field for the offset of the underlying
> > >   file system
> > > - defines a new VOP_READDIR() that will return
> > >   the new "struct dirent" that is used as the
> > >   default one for a new getdirentries(2).
> > > - the old/current getdirentries(2) uses the old
> > >   VOP_READDIR32() by default.
> > > 
> > > For the case of a file system that supports both
> > > the new and old VOP_READDIR(), they are used by
> > > the corresponding new and old getdirentries(2)
> > > syscalls.
> > > 
> > > For a file system that only supports one of
> > > the VOP_READDIR()s, the "struct dirent32"
> > > is copied to "struct dirent" (or vice versa).
> > > 
> > > At this point, all file systems would support
> > > the old VOP_READDIR() and I think the new
> > > VOP_READDIR() can easily be added for NFS,
> > > ZFS. (OpenBSD already has UFS code for
> > > essentially a new struct dirent and hopefully
> > > that code could be ported easily, too.)
> > > 
> > > Anyhow, any comments on this approach? rick
> > 
> > I do not think we need to have in-kernel compatibility shims.
> > The work, big but relatively trivial, is to convert filesystems to
> > use the new ino_t, even if the on-disk structures still use 32bit
> > inode number.
> > 
> What about old binaries that do getdirentries(2) and expect the old
> structure with 32bit d_fileno or the linux compatibility stuff?
> I suspect that there are some old staticly linked binaries out there
> that does/expects the old getdirentries.
No, let me restate my position.  There are two places for backward
compatibility, on is in-kernel binary interface, and another is applications,
i.e. KBI and ABI.

My opinion is that we must provide strict backward ABI compatibility
to have even right to be called useful OS.  In particular, the syscalls
like current getdirentries (156 and 196) providing 32-bit inonums, must
be kept with their current binary contract.  The userspace issues do
not end there, but this is not the currently discussed item.

On the other hand, providing KBI compat for filesystems which work
right now with 32bit inode numbers, should not be done. I.e., no
VOP_READDIR_32INO(), all filesystems must be converted once.

For syscalls 156 and 196 (and some more), the converter must be written
in the vfs_syscalls.c which translates the new dirents into old dirents,
at the level of best efforts.

> 
> Having said that, most apps will use readdir(3). Do we need to somehow
> allow old binaries work with a newer libc? (If so, that's going to be
> really nasty. I had assumed that old libc code would do old
> getdirentries(2) and, as such, having a working old and new getdirentries(2)
> would handle old binaries?
> 
> I was trying to avoid data copying for the case of an old getdirentries(2)
> by having file systems provide VOP_READDIR() calls for both old and new
> structures.
> It is certainly possible to have all file systems only produce the new
> "struct dirent" and then just do data copying/conversion to the old one.
> 
> Btw, I think the new getdirentries(2) will need additional arguments,
> since the offset for the underlying file system needs to be provided
> along with the "logical offset", which is the byte offset within the
> directory being returned as "struct dirent"s.
> 
> > Really problematic part of this change is the usermode ABI breakage.
> > The struct dirent is only the start of the whole issue. ino_t is
> > embedded into more structures which are part of the contract, e.g.
> > struct stat.  We have to provide new syscalls which accept or return
> > the affected structures.
> > 
> > And then, there are libraries which embed ino_t into their ABI.
> > Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even
> > after the base system is fixed by properly providing the compat shims
> > and symbol versions for the affected libraries, we get the same
> > problem
> > with the binaries not from base.
> > 
> > Summary of the issue with ino_t is that it is not too hard to fix the
> > kernel, comparing with the ABI issues which must be solved in
> > usermode.
> > 
> > 
> Yes, I was just going to look at d_fileno as a starting point.
> (For whatever reason d_fileno isn't defined as ino_t?)
> 
> I was specifically avoiding any use of "ino_t" and saw it as something
> that needed to eventually change to 64 bits at the very end.
> I was aware of Gleb Kurtsou's work, but didn't realize it lived
> in projects/ino64 and he had mentioned that he was busy, but
> would try and find time to update the patch.
> I will look at projects/ino64 and it sounds like Kirk
> would like to figure it all out in projects/ino64 and
> eventually do a "super patch" to head. This sounds fine
> to me, if we can pull it off.
> 
> rick