Removal of old/outdated files from www.FreeBSD.org site

Sat Jul 28 21:14:15 UTC 2012

On 28 Jul 2012, at 18:08, Glen Barber wrote:

> "Simon L. B. Nielsen" <simon at FreeBSD.org> wrote:
> 
>> On 28 Jul 2012, at 05:17, Glen Barber wrote:
>> 
>> [Stale files]
>> 
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5798.html
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5802.html
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5834.html
>>> 
>>> If someone from clusteradm@ (or someone else with access to the
>> machine
>>> on which the documentation build output exists) can remove these old
>>> files so they are not archived by search engines (since in some
>> cases,
>>> old file names can indicate very old files, and worse, very old
>>> information that could be potentially dangerous to a user looking for
>>> specific information), I would greatly appreciate it.
>> 
>> The problem is that it's not a simple thing to do. Our build installs
>> with the option to not install if files are identical, so timestamps
>> can't be used alone.
> 
> Ah, I did forget about that.
> 
>> E.g.
>> http://www.freebsd.org/doc/en/books/porters-handbook/TRADEMARKS.html :
>> 
>> -r--r--r--  1 www  wwwadm  4494 Jan  9  2010 TRADEMARKS.html
>> 
>> I have removed the mentioned files but I don't have time to do a full
>> sweep as I might end up deleting too much.
> 
> Ok, thank you.  My big concern is if someone "accidentally" finds an old document and does something potentially dangerous to their system.

I agree they should be moreved.

As a reference, the build script is at: http://svnweb.freebsd.org/doc/head/share/tools/webupdate

So anyone wanting to try and fix that can start reading that. The simple brute force solution would e.g. be a weekly install to a separate dir and then check which files should not be in the dir we serve www.freebsd.org off.

Another solution might be to make the weekly full build install to a different dir and switch the clean and the old dir... but I slightly worry that any error in the script will result in no content on www.

>>> Furthermore, if someone with the appropriate access can provide a
>> list
>>> of similarly-named files (which also likely can be fixed with adding
>> a
>>> section id to the source), I will personally fix the section id so
>> these
>>> files do not occur again.  (It would be even more helpful if the
>> files
>>> could be provided as an attachment so I can view the source to track
>>> down from where they are being generated.)
>> 
>> There is no need for special access to do that. Just build all the docs
>> in html-split and find xNNNN.html files. It's a regular thing which has
>> to be done as people forget when adding new content. I also remember
>> hunting down those files when I more active in doc.
> 
> I will look into this for a permanent solution then.  It is difficult to spot unless local changes are made though.  But, 'make clean' followed by 'svn stat' will reveal these edge cases.

Hmm, how is it difficult to spot? A build of a document should never ever produce an xNNNNN.html file. If it does, a sect1 is missing an id.

Or am I missing something here?

If you don't want to build everything, you could also just 

-- 
Simon L. B. Nielsen