Deterministic builds?

Kostik Belousov kostikbel at gmail.com
Mon Oct 11 09:03:20 UTC 2010


On Sun, Oct 10, 2010 at 10:51:20PM +0200, Erik Cederstrand wrote:
> Hi hackers
> 
> As a followup to the "Timestamps in static libraries" thread which resulted in a '-D' option to ar(1), I'd like to discuss if it is a worthy goal of the Project to create deterministic builds. By that I mean for two make build+install world+kernel+distribution runs, every contained file is bitwise identical between the two runs.
> 
> Deterministic builds would be useful for me, since I'm creating binary diffs against lots of FreeBSD builds, and smaller diffs are good. Also, I'd like to detect which files have changed between two commits. I imagine it would also be useful for things like IDS and freebsd-update.
> 
> Currently, this does not hold for static libraries (*.a), kernel modules (*.ko / *.ko.symbols) and the following:
> 
> bthidd
> cc1
> cc1obj
> cc1plus
> clang
> clang++
> ctfconvert
> freebsd.cf
> freebsd.submit.cf
> kernel
> kernel.symbols
> libcrypto.so.6
> libufs.so.5
> loader
> pxeboot
> sendmail.cf
> submit.cf
> tblgen
> zfsloader
> 
> Most of the libraries can be brought to be identical by using ar -D. Some record the absolute OBJDIR path to header files, though (libc.a for example).
> 
> I tried adding 'D' to ARFLAGS in share/mk/sys.mk, but that's only part of the solution. ARFLAGS are overridden hundreds of places in the source code, and in some places ARFLAGS isn't even used (or AR for that matter). Is it worthwhile to go through the whole tree, fixing up these calls to ar? A lot of this is in contrib/ code.
> 
> Another option is to add a WITH_DETERMINISTIC_AR knob to the build to compile ar with D as default behaviour. This would make the above changes unnecessary, but is more intrusive.
> 
> A third option is that this is not a priority for the community, or directly unwanted, and that I just post-process my builds myself.
> 
> I don't know what causes the checksum difference in .ko files - there is no size difference, and no difference according to strings(1). A bsdiff on the two is typically around 160B.
> 
> .ko.symbols have some unique identifiers or addresses internally.
> 
> kernel, loader, zfsloader and pxeboot have a build date recorded, kernel also has absolute path to GENERIC. OK for the kernel, I think, although it would be easier for me if this was just stored in a separate file since binary diffs on large files are expensive.
> 
> clang, clang++ and tblgen store some absolute paths to .cpp files in the src repo internally, plus unique identifiers.
> 
> freebsd.cf, freebsd.submit.cf, sendmail.cf and submit.cf record the absolute OBJDIR path to sendmail
> 
> What do you think?
My personal opinion that the feature is nice to have. Unless the changes to
get this working are too large, and, more importantly, unless the maintenance
cost of having this in good shape is too high, sure we would better have
deterministic build results.

Also, the deterministic builds require somebody who would monitor the
feature, either manually, or by setting some bot that automatically
checks it. Otherwise, I suspect, it will degrade.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20101011/c1a518cf/attachment-0001.pgp


More information about the freebsd-hackers mailing list