single source publishing?

doc at darklogik.org doc at darklogik.org
Mon Jan 16 14:33:47 UTC 2006


Hello.

I realise that this is slightly off topic, but I reasoned that
this would be the most appropriate place to write to as there's
likely to be a large number of documentation writers here and I
am working almost exclusively on FreeBSD.

Is it really possible to do high quality single source publishing
on FreeBSD with only one degree of seperation? By one degree I
mean that the different file formats are generated in one step
from the source file - not, for example, converting to HTML with
one filter and then to PDF from that HTML source.

Ideally I would like to write LaTeX or XML and produce -

- XHTML
- UTF-8 text
- Postscript
- PDF
- DVI

Of all the options that I have seen, Docbook or LaTeX-docbook
seem to make the most sense. The problems I am facing are:

1) The XML toolchain seems to be somewhat lacking on FreeBSD. I
do not want to use a Java-based or commercial tool. Having looked
around at the available options and having tried a few test
documents, it appears that I can generate XHTML from XML Docbook
and little else. I can't even generate UTF-8 text as 'links' (the
browser) doesn't seem to support UTF-8 and this appears to be the
only way convert HTML to text whilst retaining page layout. The
only existing C based tool to produce XML-FO for Postscript
(xmlroff) is extremly unstable and crash-prone (I've not managed
to get a single piece of output from it).  I've heard that
passivetex can do XML->PS/PDF but apparently it's ancient and
unmaintained, these days.

2) I can generate PDF, Postscript and DVI from LaTeX but only
plain (non-XML) HTML. I would then have to run the HTML through a
tool such as 'tidy' and even then, the output would have to be hand
edited in order to allow custom styling via CSS.
Of course, there's still the problem with UTF-8 text and links
unless TeX can do nice typesetting in plaintext too?

I have tried to put together a few shell scripts in order to try
out some long conversions, eg XML->XHTML->PS->PDF but the output
is, frankly, ugly and uncontrollable due to so many degrees of
seperation.

My current options appear to be:

  Docbook XML -> XHTML -> Postscript (html2ps) -> PDF (ugly)

The above creates a sorry excuse for a PDF copy (effectively the
page breaks are wrong as it's done by length of output as opposed
to explicit breaks).

Alternatively:

  LaTeX -> Postscript
        -> DVI
        -> PDF
        -> HTML -> XHTML (via tidy)

The XHTML output in this case is quite poor and not really
properly styleable[1] with CSS.

As far as I can see, there's no way to produce good quality,
predictable output in the current formats. I'm more than a little
surprised by this as XML in particular is always touted as being
a wonderful structural format that permits easy conversion to any
format.

To be honest, I could probably tolerate something like this:

  Docbook XML -> XHTML
              -> LaTex -> PS
                       -> PDF
                       -> DVI

Does anybody have a solution, perhaps that they use regularly?
Or, do I have to get out my toolbox and spend the next ten years
in total isolation?

(please CC as I'm not subscribed)

cheers,
M

--

[1] pretend this is a real word please...

-- 
pgp: http://www.darklogik.org/pub/pgp/pgp.txt
0160 A46A 9A48 D3B0 C92F B690 17FB 4B72 0207 ED43
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 825 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20060116/1d17908e/attachment.sig>


More information about the freebsd-doc mailing list