Deterministic builds?

Erik Cederstrand erik at cederstrand.dk
Sun Oct 10 20:51:23 UTC 2010


Hi hackers

As a followup to the "Timestamps in static libraries" thread which resulted in a '-D' option to ar(1), I'd like to discuss if it is a worthy goal of the Project to create deterministic builds. By that I mean for two make build+install world+kernel+distribution runs, every contained file is bitwise identical between the two runs.

Deterministic builds would be useful for me, since I'm creating binary diffs against lots of FreeBSD builds, and smaller diffs are good. Also, I'd like to detect which files have changed between two commits. I imagine it would also be useful for things like IDS and freebsd-update.

Currently, this does not hold for static libraries (*.a), kernel modules (*.ko / *.ko.symbols) and the following:

bthidd
cc1
cc1obj
cc1plus
clang
clang++
ctfconvert
freebsd.cf
freebsd.submit.cf
kernel
kernel.symbols
libcrypto.so.6
libufs.so.5
loader
pxeboot
sendmail.cf
submit.cf
tblgen
zfsloader

Most of the libraries can be brought to be identical by using ar -D. Some record the absolute OBJDIR path to header files, though (libc.a for example).

I tried adding 'D' to ARFLAGS in share/mk/sys.mk, but that's only part of the solution. ARFLAGS are overridden hundreds of places in the source code, and in some places ARFLAGS isn't even used (or AR for that matter). Is it worthwhile to go through the whole tree, fixing up these calls to ar? A lot of this is in contrib/ code.

Another option is to add a WITH_DETERMINISTIC_AR knob to the build to compile ar with D as default behaviour. This would make the above changes unnecessary, but is more intrusive.

A third option is that this is not a priority for the community, or directly unwanted, and that I just post-process my builds myself.

I don't know what causes the checksum difference in .ko files - there is no size difference, and no difference according to strings(1). A bsdiff on the two is typically around 160B.

.ko.symbols have some unique identifiers or addresses internally.

kernel, loader, zfsloader and pxeboot have a build date recorded, kernel also has absolute path to GENERIC. OK for the kernel, I think, although it would be easier for me if this was just stored in a separate file since binary diffs on large files are expensive.

clang, clang++ and tblgen store some absolute paths to .cpp files in the src repo internally, plus unique identifiers.

freebsd.cf, freebsd.submit.cf, sendmail.cf and submit.cf record the absolute OBJDIR path to sendmail

What do you think?


Thanks,
Erik


More information about the freebsd-hackers mailing list