cvs commit: ports/security/vuxml vuln.xml

Fri Mar 12 07:35:49 PST 2004

Jacques A. Vidrine wrote:

> On Thu, Mar 11, 2004 at 08:51:16PM +0100, Oliver Eikemeier wrote:
> 
>>Since history is considered very valuable in the FreeBSD project, I guess
>>I would prefer that over a slight runtime optimization for ceratain tools.
>>
>>How much time does it take to produce a sorted file once and cache that?
> 
> I agree, I would not hold a `slight runtime' as higher priority than
> history readability.  But using chronological sorted input makes more
> than a `slight' difference in some cases.
> 
> Obviously any tool that will be called frequently (e.g. once for every
> port built) should do some caching of data.  Since the input is in
> chronological order, such tools need read only a minimal amount of the
> input in order to determine whether or not the cache needs updating.

Normally the file has a timestamp, if not you could add a timestamp at the
beginning or require tools that update the VuXML file to invalidate the
cache or produce a timestamp. That is even faster than reading part of the
file. Requiring moving modified entries to the top for tools to work
properly seems like causing more problems than it will ever solve.

> One could simply use the file timestamp in a limited number of
> situations: (a) You acually have a file and the timestamp can be trusted
> to be accurate; and (b) You don't care if updating the cache requires
> starting over and reading the entire input.
> 
> Some real-world scenarious that I imagined where it matters:
> 
>  (1) Download VuXML periodically.  One must be careful to preserve
>      timestamps.  Hopefully an appropriate timestamp is available
>      via the download protocol.

You would compress the file anyway.

>  (2) Stream new updates.  A tool that maintains a cache may check
>      a network resource periodically for updates.  Using e.g. HTTP,
>      it need only download the first few `new' entries, rather than
>      downloading the entire file every time.

Sort it before distributing. Distribute diffs. Be creative.

> Considering that in a few years time, the VuXML file could be
> multi-megabyte, it seems like a good idea to avoid downloading the
> entire file if possible.  Of course, other tools can take care of this
> for you, e.g. CVSup or rsync.  However, there is something to be said
> for being able to publish a VuXML file via HTTP or other `dumb' protocol
> and still get such efficiencies, especially if there could be thousands
> of downloads per day.

Please, we live in the 21th century. You are not really trying to tell me
that a file has to be sorted by humans to be efficiently downloadable?

> But perhaps, after all, this part is over-engineered.  I don't like the
> difficulty in reading `diffs' that is a side-effect, either.  One could
> require that content changes and sorting always be done in separate
> commits, of course, but that could be an odious requirement.  Tools must
> implement more complex behavior to take advantage of the chronological
> sorting (but of course they can just `play dumb' too).

Cvsweb is your friend. It is *so* easy to grasp what I've done here:

  <http://cvsweb.freebsd.org/ports/security/vuxml/vuln.xml.diff?r1=1.39&r2=1.40&f=h>

It is *much* harder to do when you have to read the entry twice.

> So in the end, I guess I'm on the fence about it.  I'd like to keep the
> status quo (chronological sorting) for now--- I have a tool that uses
> it :-) ---, but I'd like to hear more convincing arguments either way.

Someone *will* do a commit that will break your tool, for sure. Should I
send you a xslt script that sorts the file? When the database is
multi-megabyte you won't have it in CVS anyway, but use some XML database
instead.

-Oliver