Garrett Cooper youshi10 at u.washington.edu
Mon Jul 2 04:32:15 UTC 2007

Pardon me for being naive, but wouldn't it be wiser for all of the data 
in the +CONTENTS file to be aggregated into sections instead of having 
line by line info?

Example (net/samba_3.0.25a):

@comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
[~100 lines of repetitive data...]
@comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc

    Could be aggregated into:

9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
@end MD5

    or something similar to XML.

    This would reduce the filesize from n bytes to n - (9 + 4 -1) * 
i_entries + 8. In larger package files this would reduce the amount of 
data parsing by a long shot. Also, more powerful scripting languages 
like Perl, Python, or smart parsers in C could make short work of this 
data and just extract the MD5 elements for comparison.

    Also, by doing a little extra work when creating packages by 
organizing all the sections together, I think that the file size could 
be reduced by a large degree.

    Similar fields to @comment MD5 could be reduced I believe, but with 
less benefit maybe, other than just the @unexec rmdir, etc lines.

    Either that, or the data should be organized into separate files I 
think (increases number of files, but reduces overall processing time IMO).


More information about the freebsd-ports mailing list