Millions of small files: best filesystem / best options

Tue May 29 07:59:52 UTC 2012

On Tue, 29 May 2012 17:35:18 +1000 Bruce Evans <brde at optusnet.com.au>  wrote:
> 
> But I expect using a file system would be so slow for lots of really
> small files that I wouldn't try it.  Caching is already poor for
> 4K-files, and a factor of 20 loss won't improve it.  If you don't want
> to use a database, maybe you can use tar.[gz] files.  These at least
> reduce the wastage (but still waste about twice as much as msdosfs with
> 512 byte blocks), unless they are compressed.  I think there are ways
> to treat tar files as file systems and to avoid reading the whole file
> to find files in it (zip format is better for this).

As someone else pointed out, the right thing for Alessio may
be to just use fusefs-sqlfs or may be even roll his own!
Metadata can be generated on the fly. If performance is an
issue he can slurp in the whole file and use write-through for
any updates. A million 200 bytes files would take less than
512MB.

Another alternative: 9pfuse (from plan9ports). There is even
an sqfs written in 339 lines of python on github that'd bolt
right on 9pfuse! He can use it as a template to build exactly
what he wants. There is also tarfs etc. in plan9ports but it
provides readonly support.