[rfc] 64-bit inode numbers

Garance A Drosehn gad at FreeBSD.org
Fri Jun 24 21:07:20 UTC 2011


On 6/23/11 6:26 PM, Kostik Belousov wrote:
> On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote:
>    
>> Consider the thread "Increasing the size of dev_t and ino_t" from
>> freebsd-arch in 2002:
>>
>> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html
>>
>> In particular, this message by Robert Watson:
>>
>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch
>>
>> I just participated in an online conference for OpenAFS, and while it
>> isn't exactly taking the world by storm, I keep thinking it would be
>> useful if FreeBSD could map individual AFS volumes to unique dev_t
>> identifiers.  And given the way AFS is implemented (as a global FS
>> with many cells all reachable at the same time), and given the way most
>> sites deploy AFS (with thousands or tens-of-thousands of individual
>> AFS volumes *per site*), that adds up to a lot of values for dev_t.
>>
>> The upcoming release of OpenAFS should include a working and pretty
>> stable AFS client for FreeBSD, so having a larger dev_t would have
>> a more immediate application than it did back in 2002.
>>      
> Am I right that the issue is the uniqueness of the dev_t for each
> AFS volume, as reported by stat(2) ?
>
> Shouldn't the AFS client synthesize the dev_t for each new volume
> mounted ? It seems that the current 32bit dev_t would be enough,
> since I do not expect to see hundreds of thousands of mounts
> on an single system.
>
> Please note that we do not guarantee dev_t stability across reboots
> even for real devices.
>    
The AFS cell at RPI has approximately 40,000 AFS volumes, and each
volume should have it's own dev_t (IMO).  That's just counting the
collection of AFS volumes which are on RPI file servers, and any
user sitting on one computer could access AFS volumes which are
made available by other sites (aka "AFS cells").  Most RPI users
would only have access to maybe 1/4 of those volumes which exist
at RPI, but we do know that individual users have run 'find' over
the entire RPI cell looking for whatever they're looking for.  I
once did a run of 'md5deep' on the entire RPI cell, thanks to a
symlink which I didn't realize was in my home directory!

So one person can easily trigger the access of 10,000 AFS volumes
on one computer using one command.  That might sound terrifying if
you imagine it as being 10,000 NFS mounts, but accessing AFS volumes
isn't the same amount of work as auto-mounting NFS filesystems.
So ignore whatever problems you might expect to see with 10,000
filesystems mounted on one computer.  Just realize that it is very
easy for a single user to access tens of thousands of AFS volumes
from one computer, and it would be "most correct" (programming wise)
if all of those AFS volumes were to get a unique value for dev_t.
And of course it's even easier for a remote-access system to access
tens-of-thousands of AFS volumes, since it would have a few dozen
users logged in at the same time.

Obviously most computers never access even 30,000 AFS cells before
they (as the AFS client) will reboot, but I'm wondering how much
overhead is there in trying to make sure that many different volumes
are mapped to unique dev_t numbers.

Please realize that I do not mind if people felt that there was no
need to increase the size of dev_t at this time, and that we should
wait until we see more of a demand for increasing it.  But given the
project to increase the size of inode numbers, I thought this was a
good time to also ask about dev_t.  I ask about it every few years :-)

-- 
Garance Alistair Drosehn            =   gad at gilead.netel.rpi.edu
Senior Systems Programmer           or  gad at freebsd.org
Rensselaer Polytechnic Institute    or  drosih at rpi.edu



More information about the freebsd-fs mailing list