+CONTENTS files

Mon Jul 2 07:55:21 UTC 2007

[LoN]Kamikaze wrote:
> Garrett Cooper wrote:
>   
>> Pardon me for being naive, but wouldn't it be wiser for all of the data
>> in the +CONTENTS file to be aggregated into sections instead of having
>> line by line info?
>>
>> Example (net/samba_3.0.25a):
>>
>> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
>> man/man1/log2pcap.1.gz
>> [~100 lines of repetitive data...]
>> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc
>> man/man8/vfs_notify_fam.8.gz
>>
>>    Could be aggregated into:
>>
>> @MD5
>> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
>> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
>> [etc..]
>> @end MD5
>>
>>    or something similar to XML.
>>
>>    This would reduce the filesize from n bytes to n - (9 + 4 -1) *
>> i_entries + 8. In larger package files this would reduce the amount of
>> data parsing by a long shot. Also, more powerful scripting languages
>> like Perl, Python, or smart parsers in C could make short work of this
>> data and just extract the MD5 elements for comparison.
>>
>>    Also, by doing a little extra work when creating packages by
>> organizing all the sections together, I think that the file size could
>> be reduced by a large degree.
>>
>>    Similar fields to @comment MD5 could be reduced I believe, but with
>> less benefit maybe, other than just the @unexec rmdir, etc lines.
>>
>>    Either that, or the data should be organized into separate files I
>> think (increases number of files, but reduces overall processing time IMO).
>>
>> Thanks,
>> -Garrett
>>     
>
>
> In some cases the order of data stored is important and thus it cannot be
> seperated into section. Also, this layout allows for very simple parsing with
> usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike XML, which is
> rather complex and thus does not belong into base, in my opinion.
>   

    I didn't say XML exactly. I say XML-like, with implied end and begin 
tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5, or 
something similar.

    The only plus I can see is from cut, but I would think that sed, 
awk, and perl would work much better with a revised format..

    My point being is that the +CONTENTS file is bloated a lot by 
useless lines, and it would help speed up package processing if it was 
clipped or reduced somehow I would think.

    Plus, expat's MIT license, which I believe is compatible with the 
BSD license (or more compatible than the GPL variants). The only 
difference that stands out on the MIT license from what I can tell is 
paragraph 3 in the BSD license isn't present.

-Garrett