cvs commit: ports/security/vuxml vuln.xml

Fri Mar 12 09:01:24 PST 2004

On Fri, Mar 12, 2004 at 04:35:35PM +0100, Oliver Eikemeier wrote:
> Normally the file has a timestamp, if not you could add a timestamp at the
> beginning or require tools that update the VuXML file to invalidate the
> cache or produce a timestamp. That is even faster than reading part of the
> file. 

Yes, we went over that.  As I already described, a file timestamp allows
the application to either not read the file, or read the whole file.
Entry timestamps and chronological sorting allow the application to only
read a tiny portion of the file.

We're not talking about file timestamps.  

Also, tools don't update VuXML, humans do.   Tools process VuXML.  Do
not confuse the document with the database.

(oops, *requiring* that the document be sorted starts to cross that
line, too :-)

> Requiring moving modified entries to the top for tools to work
> properly seems like causing more problems than it will ever solve.

You've identified precisely one problem: reading diffs.  This is an
annoying problem, granted.  You need not pretend that it is a larger
issue than that.

> >One could simply use the file timestamp in a limited number of
> >situations: (a) You acually have a file and the timestamp can be trusted
> >to be accurate; and (b) You don't care if updating the cache requires
> >starting over and reading the entire input.
> >
> >Some real-world scenarious that I imagined where it matters:
> >
> > (1) Download VuXML periodically.  One must be careful to preserve
> >     timestamps.  Hopefully an appropriate timestamp is available
> >     via the download protocol.
> 
> You would compress the file anyway.

Compression has nothing to do with preserving the timestamp.  Maybe you
are thinking of archive formats.  You *still* must be careful to
preserve the timestamp.  Hopefully both the source system and the
downloading system have the same date.  An advantage of using date
strings in the entries to determine what needs to be processed is that
*actual* dates set on systems are irrelevant.

> > (2) Stream new updates.  A tool that maintains a cache may check
> >     a network resource periodically for updates.  Using e.g. HTTP,
> >     it need only download the first few `new' entries, rather than
> >     downloading the entire file every time.
> 
> Sort it before distributing. Distribute diffs. Be creative.

This `creativity' creates duplication.  I much prefer that there is as
little in between the original VuXML file and the processing application
as possible.  I don't want to constrain distribution.  I'd prefer that
tools be able to process the file whether they fetch the file from CVS,
CVSup, cvsweb, an HTTP server, an FTP server, or whatever.

Actually, ``sort it before distributing'' is exactly the method that is
established: additions/modifications to ports/security/vuxml/vuln.xml
are to be chronologically sorted.

> Please, we live in the 21th century. You are not really trying to tell me
> that a file has to be sorted by humans to be efficiently downloadable?

I am saying that the file published at ports/security/vuxml/vuln.xml
needs to be sorted to be used in these scenarios.  Whether or not it is
sorted by humans is not relevant (similar to, say, ports category
Makefiles).

> >But perhaps, after all, this part is over-engineered.  I don't like the
> >difficulty in reading `diffs' that is a side-effect, either.  One could
> >require that content changes and sorting always be done in separate
> >commits, of course, but that could be an odious requirement.  Tools must
> >implement more complex behavior to take advantage of the chronological
> >sorting (but of course they can just `play dumb' too).
> 
> Cvsweb is your friend. It is *so* easy to grasp what I've done here:
> 
>  <http://cvsweb.freebsd.org/ports/security/vuxml/vuln.xml.diff?r1=1.39&r2=1.40&f=h>
> 
> It is *much* harder to do when you have to read the entry twice.

I cannot for the life of me connect what you just wrote with my
paragraph above.  Did I not just write about the negative side-effects?

> >So in the end, I guess I'm on the fence about it.  I'd like to keep the
> >status quo (chronological sorting) for now--- I have a tool that uses
> >it :-) ---, but I'd like to hear more convincing arguments either way.
> 
> Someone *will* do a commit that will break your tool, for sure. 

Of course.  Such happens.  The base system and ports break, also.  What
exactly is your point?

> Should I send you a xslt script that sorts the file? 

Sure!  You might post it for general use.  Even better, if you could
post (or send privately) a modification to ports/security/vuxml for
review that adds a `make sort' or similar target.  I suggest that the
sort be stable, but that isn't strictly necessary.

> When the database is
> multi-megabyte you won't have it in CVS anyway, but use some XML database
> instead.

Huh?

Anyway, I understand your objection.  I have explained why it is like it
is (now twice) and admitted that it may be overkill.  If you have
something still more to add, great.  Otherwise, I'm disinclined to
abandon the sorting just yet.  I still believe it is useful, but maybe
I'm the only one who thinks so :-)

Let me spell it out again so that there is not more wasted postings
repeating what has already been said:

  With the chronological sorting, tools might be allowed to process only
  a portion of the VuXML file.

  Without chronological sorting, tools must process all or none of a
  VuXML file.

  Chronological sorting makes diffs between versions harder to read.

This is great, this is exactly the kind of discussion I want to see
before making VuXML ``really really official'' and drafting something for
the security web page / Porter's Handbook / and whatever.

So far you've criticized practically every aspect of VuXML.  Thanks :-)

Cheers,
-- 
Jacques Vidrine / nectar at celabo.org / jvidrine at verio.net / nectar at freebsd.org