+CONTENTS files

Mon Jul 2 09:59:29 UTC 2007

Quoting Garrett Cooper <youshi10 at u.washington.edu> (from Mon, 02 Jul  
2007 00:55:25 -0700):

> [LoN]Kamikaze wrote:
>> Garrett Cooper wrote:
>>
>>> Pardon me for being naive, but wouldn't it be wiser for all of the data
>>> in the +CONTENTS file to be aggregated into sections instead of having
>>> line by line info?
>>>
>>> Example (net/samba_3.0.25a):
>>>
>>> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
>>> man/man1/log2pcap.1.gz
>>> [~100 lines of repetitive data...]
>>> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc
>>> man/man8/vfs_notify_fam.8.gz
>>>
>>>   Could be aggregated into:
>>>
>>> @MD5
>>> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
>>> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
>>> [etc..]
>>> @end MD5
>>>
>>>   or something similar to XML.
>>>
>>>   This would reduce the filesize from n bytes to n - (9 + 4 -1) *
>>> i_entries + 8. In larger package files this would reduce the amount of
>>> data parsing by a long shot. Also, more powerful scripting languages
>>> like Perl, Python, or smart parsers in C could make short work of this
>>> data and just extract the MD5 elements for comparison.
>>>
>>>   Also, by doing a little extra work when creating packages by
>>> organizing all the sections together, I think that the file size could
>>> be reduced by a large degree.
>>>
>>>   Similar fields to @comment MD5 could be reduced I believe, but with
>>> less benefit maybe, other than just the @unexec rmdir, etc lines.
>>>
>>>   Either that, or the data should be organized into separate files I
>>> think (increases number of files, but reduces overall processing time IMO).

>> In some cases the order of data stored is important and thus it cannot be
>> seperated into section. Also, this layout allows for very simple   
>> parsing with
>> usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike   
>> XML, which is
>> rather complex and thus does not belong into base, in my opinion.

We have libbsdxml in the base already (an old version of one in the ports).

>    I didn't say XML exactly. I say XML-like, with implied end and begin
> tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5,
> or something similar.

The problem is, that a change would break existing installations, as  
they can not cope with such a new format. Feel free to propose  
improvements, but you need to keep in your mind, that any supported  
FreeBSD release has to be able to install packages with only the  
package tools available in the basesystem.

>    My point being is that the +CONTENTS file is bloated a lot by
> useless lines, and it would help speed up package processing if it was
> clipped or reduced somehow I would think.

You need to provide numbers. Without them this is pure speculation.

And you have to explain, why the current parsing routines can not be  
speed up for the current format, maybe the implementation is just a  
little bit outdated compared to todays parsing knowledge...

Bye,
Alexander.

-- 
Life is a grand adventure -- or it is nothing.
		-- Helen Keller

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137