Parallel Builds

Thu Oct 19 09:24:57 UTC 2006

Hello,

Since Multi-core processors are becoming popular (or, more 
egocentrically, since I've acquired one), I've become interested in 
parallel compilation. Unfortunately, it seems that parallel builds of 
any kind are completely unsupported by the ports framework at the 
moment. My experimentation with parallel builds has lead to a lot of 
build failures: Many ports fail when compiled with gmake -j2 instead of 
gmake, and running, say, two portupgrade instances in parallel has them 
step on each others often.

As using n processor cores instead of just one gives pretty much an 
n-fold speed increase, particularly when compiling C++ code, I'm 
interested in investigating what it would take to add some degree of 
parallelism to the ports.

I'm sure I'm not the only person that has thought about this. Maybe 
there already is an effort to allow for parallelism in port builds. I 
therefore would like for people working on this to speak up, or more 
generally, to start a discussion on how this could be implemented. I'll 
start with giving my own thoughts. I currently see three ways to add 
parallism to the ports:

 o Mark the ports that allow parallel building by adding a new flag
   that can be used in ports makefiles, eg. PARALLEL_BUILDING=yes.
   With such a port, the build target would call, say
   "gmake -j${PARALLEL_NUM}" instead of just "gmake". PARALLEL_NUM
   would be set in /etc/make.conf. If it is undefined, the build
   target would fall back to the old behavior. I'll call this
   "micro-parallelism" for now.

   Advantages: 
   + The modification to the ports framework would be relatively small,
     since the build tool (make/gmake/whatever) takes care of the
     difficult bits like locking.
   + Real build speed increase, particularly with large ports where it
     matters most (these usually have non-linear compilation
     dependencies, so they can be parallelized).

   Disadvantages:
   - Each port would have to be marked with PARALLEL_BUILDING=yes
     individually. This means more work for the maintainers, and will
     mean that introduction of this feature will take time. (On the
     other hand, there are only a few ports that are both large and
     popular, eg., KDE, adding this feature just for those would already
     be a big win.)
     For a lot of software it is not obvious whether it supports
     parallel building, and it may have a low (but non-zero) probability
     for compilation failure with parallel building, leading to ports
     being marked with PARALLEL_BUILDING=yes in error, which will lead
     to users encountering build failures. (Or maybe that's not that
     much of a problem - after reports of build failures come in, the
     PARALLEL_BUILDING=yes flag could be removed again, and users that
     depend on the build always succeeding could simply not uses
     parallel builds. Another idea would be to use
     PARALLEL_BUILDING=maybe if the port maintainer is unsure, which
     will allow conservative users to use parallel building only for
     ports that are guaranteed to compile in parallel.)
   - The build speed advantage for ports whose built can't be 
     parallelized well is small (I believe that stage 1 of the gcc build
     would be an example for this). Also, small ports, which spend a
     lot of their time (proportionally) in the configure script would
     not see much of a speed-up.

 o Have the ports framework support building of several ports in
   parallel. This could mean that either "make -j2 install" works in
   a port directory (so the build of a port's dependencies would happen
   in parallel), or that it's possible to run more than one port build
   at one time. As above, the amount of parallelism would be
   configurable with a variable in /etc/make.conf, and there'd be a
   fallback to the old behavior. I'll call this "macro-parallelism".

   Advantages:
   + No change needed to the individual ports (probably).
   + Assuming a correct implementation, no increased probability for
     build failures.
   + Build speed-up for software consisting of several packages, eg.
     KDE, or when installing a new system.

   Disadvantages:
   - Probably difficult to implement. Locking, build failures and
     interruptions would have to be taken care of. Maybe it's not
     actually possible to do this with our make(1) (I haven't
     properly investigated this yet).
   - No speed gain when updating single large ports, eg. gcc. (To be
     fair, it must be said that some of the large ports, eg.
     OpenOffice.org, don't support micro-parallelism either. Macro-
     parallelism would at least allow the otherwise unused CPUs to
     do something sometimes.)

 o Leave the ports framework as it is, and implement support for
   parallel building in add-on tool, eg., portupgrade. The tool would
   support automatic parallelism ("portupgrade -a" would automatically
   build ports in parallel where possible), or having several
   user-created instances running at the same time. I'll call this
   "tool-based macro-parallelism".

   Advantages:
   + No change needed to the ports at all (at least theoretically, in
     practice minor changes might make the development of the build
     tool much easier).
   + Assuming a correct implementation, no increased probability for
     build failures.
   + Build speed-up for software consisting of several packages, eg.
     KDE, or when installing a new system.

   Disadvantages:
   - Moderately difficult to implement. Locking, build failures and
     interruptions would have to be taken care of. I don't see problems
     that can't be solved though.
   - No speed gain when updating single large ports, eg. gcc. (To be
     fair, it must be said that some of the large ports, eg.
     OpenOffice.org, don't support micro-parallelism either. Macro-
     parallelism would at least allow the otherwise unused CPUs to
     do something sometimes.)

A combination of micro- and macro-parallelism seems attractive, since 
there are situations where only one of these is supported, but I don't 
see how it could work properly (barring a naive approach where you end 
up running n^2 processes), since it would require cooperation between 
make(1) or the add-on tool and the build tool used by the individual 
port and the latter is more or less an unknown.

Phew. That turned into a long email. If you're still reading, thanks!

Cheers
Benjamin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-ports/attachments/20061019/ff4ef451/attachment.pgp