SHA256 checksums

Vasil Dimov vd at datamax.bg
Thu Nov 10 22:29:04 PST 2005


On Thu, Nov 10, 2005 at 07:06:29PM +0100, Simon Barner wrote:
> 
> Maybe directly on ftp-master in order to save bandwidth. Would that be
> possible?

yes, one could use
./portsaddsha256.sh /tmp/ports /ftp/pub/FreeBSD/ports/distfiles
for example

> Three comments on the script:
> 
> - Will the backticks work on a fully ports tree, or will it run into
>   memory problems? Perhaps xargs is a better alternative, but I am not a
>   shell guru...
No problems with the backticks, we have only 12707 distinfos. The
reason I used them instead of xargs or find's -exec or else is that
way the code is more readable.

> 
> - The SHA256 should be added if and only if the file's MD5 sum matches
>   the one recorded in the ports tree (otherwise, the copy there is
>   stale, and the port's name should be recorded for later manual
>   investigation).
Right, I missed the case when files do not match their MD5 sums,
those files do not deserve sha256 sum. Fixed in the script.

> 
> - There are ports that set the MD5_FILE variable, e.g. astro/setatihome,
>   so we cannot asume it's always ${PORTSDIR}/category/port/distinfo, but
>   ${PORTSDIR}/category/port/distinfo* might be a good approximation.
>   After all, it does not harm if we miss some ports.
Yes, distinfo* is really a good approximation, even more:
$ find /usr/ports -maxdepth 3 -mindepth 3 -name Makefile |xargs grep MD5_FILE= |grep -v /distinfo
outputs nothing

> -- 
> Best regards / Viele Grüße,                             barner at FreeBSD.org
>  Simon Barner                                                barner at gmx.de

Hmmz, some optimizations can be done but it will get really weird
and the slowest operations (sha256 and md5 on large archives) cannot be
optimized... if it will be run on a SMP machine we can achieve some
parallelism background-ing the sha256 processes, but it will be run
just once, lets keep it simple stupid.

here is the next version of the script:

http://vdev.datamax.bg/tmp/portsaddsha256-2.sh

--- portsaddsha256-2.sh begins here ---
#!/bin/sh -e

portsdir=${1:-/usr/ports}
distfilesdir=${2:-$portsdir/distfiles}

get_distinfos()
{
	find $1 -maxdepth 3 -mindepth 3 -name "distinfo*"
}

extract_archives()
{
	distinfo=$1

	# get all archives that have MD5 sum
	# ones that do not have MD5 sum do not deserve SHA256 sum
	for archive in `cat $distinfo |sed -nE 's/^MD5 \(([^)]+)\) = [0-9a-f]{32}$/\1/p'` ; do
		# output only archives that do not already have SHA256 sum
		if [ -z "`grep '^SHA256 ($archive)' $distinfo`" ] ; then
			# is the archive available?
			if [ -r $archive ] ; then
				# archives that do not match MD5 are loudly reported
				if grep -q "`md5 $archive`" $distinfo ; then
					echo $archive
				else
					echo "${distinfo}:${archive} MD5 sum mismatch" >&2
				fi
			fi
		fi
	done
}

cd $distfilesdir

for distinfo in `get_distinfos $portsdir` ; do
	for archive in `extract_archives $distinfo` ; do
		sha256 $archive >> $distinfo
	done
done

# EOF
--- portsaddsha256-2.sh ends here ---

and the diff it created with distfiles/ dir containing 657 files:

http://vdev.datamax.bg/tmp/ports_sha256-2.diff.gz

-- 
Vasil Dimov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 155 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-ports/attachments/20051111/2d0aaf6c/attachment.bin


More information about the freebsd-ports mailing list