Spellchecking the DocBook FreeBSD Documentation

Murray Stokely murray at FreeBSD.org
Thu Jul 15 03:04:35 UTC 2004


This is a note to explain how I use the tools in CVS to spellcheck our
documentation.  This is just one of many possible approaches.

$ cd /usr/doc/en_US.ISO8859-1/books/handbook
$ make clean; make SPELLCHECK=1 spellcheck|sort|uniq|less

The SPELLCHECK=1 variable tells the makefile to use the special
spellcheck.dsl stylesheet to omit the contents of certain tags (such
as <filename>) from the HTML output.  This variable should just be
removed and automatically added to the spellcheck target.

The spellcheck target then runs ispell over the generated HTML files,
using the FreeBSD technical lexicon dictionary in
/usr/share/dict/freebsd.

All mispelled words are printed to standard out, so we should run this
through sort and then uniq to remove duplicates.  Once a word is found
that is not a false positive, then a quick grep through the source
will tell you which file has the offending misspelling.

This approach converts the DocBook into HTML before spellchecking.
Another approach would be to spellcheck the DocBook source directly,
examining the tags and deciding what to ignore.  This could be done
with an SGML aware spellchecker (aspell doesn't seem nearly powerful
enough to me), or with a scripting language and SGML parsing
libraries.  I think that would be more work than using DSSSL or XSL
for the parsing though.

Chern Lee wrote and posted a TCL script here several years ago, and
his script may give better output than the 'make spellcheck' stuff
I've implemented above.  If it does, we should add it to CVS.

     - Murray



More information about the freebsd-doc mailing list