Some versioned storage program?

Xin LI delphij at delphij.net
Sat Mar 22 00:57:56 UTC 2008


Giorgos Keramidas wrote:
> On Fri, 21 Mar 2008 16:10:22 -0700, Xin LI <delphij at delphij.net> wrote:
>> Hi, folks,
>>
>> I'm looking for some versioned storage program that can fulfill the
>> following requirements:
>>
>>  - Open source/Free Software that can run on FreeBSD, or not far
>>    (i.e. on other POSIX OS)
>>  - Support of atomic commit/rollback.
>>  - Fast checkin time (At least,  when added/changed files are explicitly
>>    specified).
>>  - Fast update time (i.e. something like 'cvsup -s' that makes it
>>    possible to trust bookkeeping file rather than stat'ing every files)
>>  - Scalable for a large number of files, directories and revisions. Say,
>>    it is not acceptable for it to store a zillion of revisions as
>>    individual files within one directory.
>>  - Ideally it can support some sort of "hook" functions upon commit so
>>    that changes can be notified in some way such as e-mail.
>>  - Ideally it can support fast export of a snapshot for HEAD and
>>    "nearby" revision like HEAD - 1, etc.
>>
>> I think what I need is some SCM software like subversion or hg, but I do
>> not know if there is some superior stuff that matches these requirements
>> better.  Any other suggestions?
> 
> Before you start using Hg, Git or Subversion it may be worth
> experimenting a bit with them.  My apologies if you _have_ already and
> the previous sentence sounds patronising.  All I'm saying is that they
> all have a fair share of good, not so good, or even bad aspects.  So it
> would be nice to have tried them all a bit and pick the one that seems
> like the best fit for the job at hand :)

Ah...  Ok I think I should have mentioned the purpose of the system.  It 
is supposed to be used in a CMS system, and the storage program will be 
used as one auxiliary backend where rendered pages are being stored.

> To provide a few starting points:
> 
> - Subversion, Git and Hg, all run on FreeBSD
> - They support 'changesets' as the basic model of storing commits
> - Commit speed varies a bit.  For locally stored 'workspaces', Git and
>   Hg seem to be more or less equally fast, with Subversion being a close
>   second
> - Update times tend to vary a bit too.  Hg and Git will blow Subversion
>   away on locally stored repositories, but they might suck a bit on NFS
>   workspaces
> - Storing individual revisions as 'a zillion directory entries in a
>   single tree' seem to point at Subversion.  Have you already tried it,
>   and found that it doesn't scale for your sort of work?  It is used by
>   many large-ish projects, so it would be surprising but not unrealistic
>   to have scalability issues after a few million commits
> - Hooks _are_ supported by Subversion, Git and Hg (others too)
> - Checkout speed (and `export' speed) is pretty fast in Git and Hg.
>   Subversion is a bit slower, but still usable.  Changeset support is a
>   nice feature, because it doesn't matter if your `export' run takes 1.5
>   minutes instead of 20 seconds.  When a given changeset is exported in
>   any of svn/git/hg you _never_ get a mix of file revisions from
>   changesets ${FOO} ... ${FOO+j} for some arbitrarily random value of
>   'j', because 'j+k' commits happened in the mean time.
> 
> Before you _do_ embark on the journey of using a VCS for storing a bunch
> of files, it would be nice to stop for a moment and consider if you need
> one.  If you do, there _are_ options, and they are definitely not
> limited to the three systems mentioned so far.

Thanks for all these information.  I have tried svn and hg but neither 
is "just fit" and both have good stuff and drawbacks.  I have even tried 
to use cvsup/cvs in a small system when I was in university and it 
served them well for many years, however I think it would not work well 
for larger systems.

For now I am more inclined to use hg (if not using some home grown 
system as this is not exactly source code version management and a lot 
of complexity can be thus just skipped) but I think I need to find out 
how well it would work with 'pull+update' on large repositories, the 
largest hg repository I am aware of right now, is less than 1GB and the 
potential size of the repository I would use will be larger than that 
and grow from time to time.

I think it's a good point that speed would vary when the repository is 
being stored locally (git and hg) and remotely (svn), also speed of 
synchronization between several hosts would be important as well.

Cheers,
-- 
Xin LI <delphij at delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!


More information about the freebsd-hackers mailing list