Validating docbook articles...

Chuck Swiger cswiger at mac.com
Mon Feb 23 19:26:32 UTC 2004


Dag-Erling Smørgrav wrote:
> Alex Dupre <ale at FreeBSD.org> writes:
>> [ ...talking about -preserve in tidy... ]
> This reminds me of the many good reasons to convert the doc tree to
> XML.  One of these is that xmllint can both validate input files and
> clean up output files, and it does a far better job of it than tidy.

An interesting idea.  I took a quick look at converting an existing SGML 
document into XML in order to gain some idea as to the work involved.

Given an SGML prologue of:

<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
%man;
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD 
Entities//EN">
%freebsd;
<!ENTITY % trademarks PUBLIC "-//FreeBSD//ENTITIES DocBook Trademark 
Entities//EN">
%trademarks;
]>

...from doc/en_US.ISO8859-1/articles/filtering-bridges (written by ale@, of 
course :-), it's easy to add an XML prologue-- this could be done 
automaticly-- and "make lint" works just fine with an XML declaration in 
place.  So far, so good.

How does one generate proper SystemLiterals per:

|4.2.2 External Entities
|
|[Definition: If the entity is not internal, it is an external entity,
|declared as follows:]
|
|External Entity Declaration
|
|[75]   	ExternalID	   ::=   	'SYSTEM' S SystemLiteral
|           			| 'PUBLIC' S PubidLiteral S SystemLiteral

69-sec% xmllint article.sgml
article.sgml:3: parser error : SystemLiteral " or ' expected
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
                                                                             ^
article.sgml:3: parser error : SYSTEM or PUBLIC, the URI is missing
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
                                                                             ^
article.sgml:4: parser error : Space required after the Public Identifier
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:4: parser error : SystemLiteral " or ' expected
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:4: parser error : SYSTEM or PUBLIC, the URI is missing
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
                                                                              ^
article.sgml:5: parser warning : PEReference: %man; not found
%man;
      ^
[ ... ]

Are these entities published via a URI, or does one need to refer to a local 
path?  Is there a tool to update (normalize?) these ENTITY declarations 
automaticly, as using "xmllint --catalogs --loaddtd" didn't seem to help?

Maybe this seems trivial, but there are several hundred SGML source files 
which would all need to be updated this way...

-- 
-Chuck



More information about the freebsd-doc mailing list