Using Subversion for binary distribution?

Fri May 25 19:58:41 UTC 2007

On 2007-05-25 12:52, Brian Candler <B.Candler at pobox.com> wrote:
>On Fri, May 25, 2007 at 01:43:42PM +0300, Giorgos Keramidas wrote:
>> Using Subversion of a more distributed system like Git and Mercurial,
>> can work in the way you are describing, but you would have to be _very_
>> careful about file ownership (so that you don't accidentally leak files
>> owned by root to other accounts, for example), and permissions (so that
>> you don't suddenly let everyone read /etc/master.passwd, or something
>> equally or more evil).
> 
> That's a very good point. There's svn:executable, but that's only a small
> part of the equation. Perhaps mtree-like settings could be put into a user
> property, and a post-checkout script could enforce them.
> 
> The problem is, I believe, analagous to what happens when you do the initial
> system build and create the tarballs: you have to ensure that all the files
> in the tarballs are owned by the correct accounts and have the correct mode
> (not root:root and 0755, or whatever they got when the compiler built them)
> 
> If you put this information in an appropriate place for the client, it can
> apply it locally after it has done an svn checkout.

Indeed. This crossed my mind too.  An mtree script which can run every
time something is "committed", "updated" or resynced in any way from a
master copy was my initial thought of working around the permissions and
ownership issues.

Then again, a full 'checkout' can take a lot of time, and this may
create a "window of time" during which some files have very wrong
permissions and ownership.  I'm not sure if this is good enough for all
possible cases.

>> Subversion support for making 'local' changes to a checked out workspace
>> and keeping them local is simply unavailable.  The checked out tree
>> would be "polluted" with .svn/ subdirectories with all the metadata of
>> the Subversion workspace too (that's where permissions will be tricky to
>> get right).
>> 
>> The disk space requirements of a Subversion checkout are also very big.
>> At least twice the size of the checked out files, and then some more.
> 
> Having .svn directories all over the place is not a worry to me, and in fact
> that's what gives you most of the practical advantages: svn diff and svn
> revert will depend on this, as well as merging of non-conflicting changes.
> 
>     rm /bin/ls
>     svn revert /bin/ls
>     # happy days :-)
> 
> Permissions on the .svn directories need to be gotten right, but I think
> that simply setting them to root:root 0700 would be safe.
> 
> You'd have a hard job finding a disk below 160GB these days, so the size
> utilisation doesn't bother me either.

If disk is not a space, there better tools for managing 'local' per
workspace and per-checkout changes.  Git and Mercurial are two of them,
and I like a lot better their support for "pushing" and "pulling"
changes from a master repository.

But before you read below, a word of caution...

    What follows is an unverified brainstorming about management
    of a blobs of files with a distributed SCM like Mercurial
    (this is the one I'm most familiar with, so I'll let other
    people speak for Git as they see fit).

    I haven't put this to production use anywhere, and I don't
    really recommend using an SCM to manage installed files, but
    since you seem to like the idea, the following are a few
    random thoughts intermixed with a mini 'tutorial' about
    Mercurial/Hg.

Their merge support is also excellent, so you can keep local changes
like the '/etc/hosts' contents and other per-system changes tracked
properly as the master copy is updated.

The 'hooks' of Mercurial for example are nice for running "update"
scripts which take care of the permission problems, similar to the one
you mentioned above.

Mercurial and Git are a very big departure from the centralized way of
working with Subversion, but it is precisely this 'distributedness' that
makes them ideal for keeping local per-system changes whenever these
local changes make sense.

For example, if you want to keep a central copy of the FreeBSD base
system in a 'master' machine called 'buildhost', you can create a
Mercurial workspace with the binary files of a FreeBSD base system at:

    buildhost:/repos/freebsd/releng7/base

To populate a new disk with the files of the freebsd7/base workspace,
essentially "installing" a new copy of FreeBSD on the system, you can
mount the new disk on your laptop (i.e. through a USB disk connection),
and use something like:

    laptop# fdisk -BI /dev/da0
    laptop# bsdlabel -w -B /dev/da0s1
    laptop# newfs /dev/da0s1a

and then you can 'clone' the base installation from 'buildhost', through
an SSH tunneled clone operation:

    laptop# mount /dev/da0s1a /mnt
    laptop# cd /mnt
    laptop# hg clone ssh://buildhost//repos/freebsd/releng7/base .

The next step would be to edit the /mnt files locally, while the disk is
still connected on your laptop (i.e. to fix `/etc/fstab' and other files
which do need local changes).

You install hook scripts in /mnt/.hg/hooks and set them up to run every
time a group of changes is pulled, every time a commit is done in the
/mnt workspace, and every time 'hg update' is used to update local
files:

    laptop# cd /mnt
    laptop# cat .hg/hgrc
    [paths]
    default = ssh://buildhost//repos/freebsd/releng7/base

    [hooks]
    changegroup = /bin/sh .hg/hooks/fixperms.sh
    commit      = /bin/sh .hg/hooks/fixperms.sh
    update      = /bin/sh .hg/hooks/fixperms.sh

    laptop#

A good idea is to also set up a per-managed host workspace at the
buildhost, i.e. using workspace paths like:

    ssh://buildhost//repos/freebsd/hosts/kobe

and setting the 'default-push' path of .hg/hgrc to point to the 'backup'
clone of the host 'kobe':

    laptop# cat .hg/hgrc
    [paths]
    default = ssh://buildhost//repos/freebsd/releng7/base
    default-push = ssh://buildhost//repos/freebsd/hosts/kobe

Then you can unmount the new disk, move it to its destination machine
and let the "distributedness" take over.

Every time something changes in the 'master' copy of the base
installation, you can "pull" the new changes:

    laptop$ ssh kobe
    kobe$ sudo -i
    Password: ******
    kobe# cd /
    kobe# hg incoming

This will show you the changes you would have 'pulled' but not modify
anything, i.e.:

    kobe# hg incoming
    comparing with ssh://buildhost//repos/freebsd/releng7/base
    searching for changes
    no changes found
    kobe#

You can "pull" new changes with "hg pull":

    kobe# hg incoming
    comparing with ssh://buildhost//repos/freebsd/releng7/base
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 29 changesets with 47 changes to 26 files (-1 heads)
    (run 'hg update' to get a working copy)
    kobe#

If there are merge conflicts, Mercurial will create new "heads" with the
conflicting changes but *not* affect any of the files until you run "hg
update".

Local changes which have been 'committed' in the managed host's file
system will not be lost when you pull.

You can 'merge' the remote updates, using 3-way merge tools (i.e. kdiff3
or even plain good ol' vim), you can revert the merge, you can roll back
changes to a known good-state, etc. and so on.

When managing a large set of files as a 'branch', everything that can be
done with Subversion can also be done with Mercurial, and you also get
the benefits of a fully distributed system, including (but not
expressly limited to) the following:

  + Blazingly fast local operation

  + Minimal dependency on the network (you never have to go over the
    network for looking at history and making commits/changes, unless
    you really want to)

  + Speed.  It's amazing how horrendously slow some operations can be
    when you have to go over the network for *everything* except perhaps
    one operation, like "svn diff".

  + Fully usable local history, diffs

  + Merge tracking that is *far* superior to what CVS or Subversion have
    ever provided so far

  + The ability to pull/push identical changes to multiple hosts

  + Tunnelling of SCM operation through SSH, HTTP, HTTPS

  + Full support for arbitrarily complex local changes, completely
    independently of remote hosts (i.e. no host will be affected by
    local changes, unless the changes are specifically "pushed" to it,
    or "pulled" while working on the host itself)

  + Most importantly.  Changes can be tested *locally* on the host which
    they should affect and only on that host!

    If they don't work, rollback is easy and nobody's central Subversion
    repository is bloated by the changes, as they have never hit anybody's tree.

    Combined with the extremely easy clonability of "Hg-managed" blobs
    of files, and a test host someplace out of production, you can guess
    how easy testing of changes which are highly experimental can be ;-)

IMHO, if you _have_ to use an SCM to manage the FreeBSD base system, and
you are ok with the idea of running hook scripts to fix permission and
ownership of the checked out files (as you seem to be), don't use
Subversion...  use a fully distributed tool.  It's inherently better for
almost any sort of job which requires 'merging' of local changes with a
remote master copy.

FWIW, more information about Mercurial (which I very briefly advocated
for above) and Git can be found online, at their sites:

    Mercurial:  http://www.selenic.com/mercurial/

    Git:        http://git.or.cz/