text formatting tools.

Murray Stokely murray at stokely.org
Sat Jan 24 13:53:40 PST 2009


On Sat, Jan 24, 2009 at 12:19 PM, Chuck Robey <chuckr at telenix.org> wrote:
> I didn't want to get all that oud about xml/xsl, because I felt that given time,
> hopefully, better tools would appear.  While the ability to spend money on that
> has hugely expanded, and the number of incompatible macro sets have also hugely
> appreared, the minimum size of any free software toolsets remains gigantic.  If
> I'm wrong here, PLEASE, tell me, I would be only too happy to be proved wrong,

I think your criticism of the distribution size of the tools is
accurate but you are focusing on a dimension that the rest of the
world has chosen to ignore in the era of 1TB disk drives.  You are
correct that any XML/XSLT based solution uses far more disk space than
any groff based solution.  I do not think many people care about this.
 Separation of content and presentation is worth far more to me then a
few bits on a disk, to say nothing of the greater portability and
programmability of XML.

> Documentation?  Well, I could point to the book named "Unix Text Processing", by
>  Dougherty/O'Reilly.  It's out of print, which is actually really pretty nice,

Would be a minority of O'Reilly books if that one was typeset with
groff rather than DocBook XML.

> OK, I've described 2 of my reasons for liking it, that it's relatively tiny, and
> that it's far more flexibile in allowing an author to take their own approach.

First reason is granted, but I think the second reason depends on a
very particular definition of "flexible" and that many reasonable
people would disagree with this and argue XML is the more flexible
solution.

> The fact that xml forces one to regard a document more like it is a database is
> probably a good thing for things like Web pages which are actually electronic
> salespeople, but it's a LOUSY method to force upon authors.  Most books just
> aren't approached with preplanning and hierarchical control which is an endemic
> requirement for a sales database tool.  So if you're not writing something like

Technical manuals are generally highly hierarchical, as are most books
actually with parts, chapters, sections, and paragraphs.  Even those
items need not be imposed on anyone with an XML/XSL tool.

> "newegg.com", well, maybe you do like it, but I never, ever, heard of anyone
> using any approach like this in any major piece of fiction, at least before some
> businesses (in another case of follow the leaderism) required it.  Just like
> many commercial companies require you use MicroSoft Word, nothing but marketing
> propaganda.  Heard of this before?

I think the MS connection is a pretty big leap as is a reduction
without argument of XML's benefits to newegg.com.

> I know we use this tool in our very good tool, the handbook.  So, what we've
> done is deny to a large number of folks the ability to format the handbooks
> unelss they're willing to install a set of enormous tools.  Used to be the
> Handbook formatted directly out of the OS with no added tools needed.  Think
> that's difficult for a non-fiction tool?  Ask Richard Stevens ghost, because his
> books could have been formatted using only the base FreeBSD IS also.

Sure, your problem could be solved by importing more XML tools into
the base system, but I think that is the opposite direction we are
going in.  A number of base system tools are in FreeBSD because they
were historically part of BSD but would today be kept as
ports/packages if they weren't already there.

LiveCD distributions such as PC-BSD could have a much larger base
system pre-installed if this is something you seem to care a lot about
in an operating system distribution.

> OK, I don't know of any negative to using groff, except that you don't get to
> point at your toolset and claim it's the latest.  All that internationalization,

I can think of dozens of reasons why we're not using groff for the
Handbook.  Off the top of my head I'll list a few :

1. How would you identify the first occurrence of each technical
acronym in the Handbook so that it could be rendered with a mouseover
definition or link to the glossary in hyper-text versions of the
Handbook (only the first occurrence because these presentation details
would be distracting and make it difficult to read if applied to every
occurrence).

2. How do you programatically extract all of the Armenian FTP sites
mentioned in a groff version of the Handbook? (so they can be listed
on the web site separately).

3. How do you pull in content from other sites on the net and
dynamically include parts of it each time you rebuild the document in
a structured way? (e.g. the way we pull in external RSS feeds on the
website, the way we pull in the results of the latest kernel stress
tests to add to the release TODO page, etc..)

4. How do you render the same content in multiple presentation styles
in the same output format?  E.g. maybe one web based version with one
color scheme for the website, and another web based version with a
different layout and color scheme?  Or one with per-chapter table of
contents and one with only a per-book table of contents for a printed
format?  All configurable with make flags to the build script and with
the key separation of content and presentation since different people
with very different skill sets will be responsible for those two tasks
in general.

5. How do you generate texts for electronic book readers, open office,
or other modern formats?

I use groff occasionally, but am a novice, so I am sure there are
solutions to some of these problems, but the ones I know of are
clearly sub-optimal.

>  Groff even produces html, and it does a really bang-up job of formatting ASCII
> text pages, something which xml tools have never been able to do.  I just don't

Sure, but those are basic output formats we've supported for a decade
with XML based tools.  What about Amazon Kindle ebooks?  Mobi ebooks?
OpenOffice documents?  We distribute more than just those three very
basic output formats.

> get the reason to go with xml, except a bad case of follow the leader.  What's
> the benefit that the users, or even the authors, accrue?  And don't fail to

Why don't you ask the publisher of the book you just cited.  Or better
yet, the author of groff, James Clark, that moved on to write most of
the open source SGML/XML tools we use in building the handbook.  I
must admit to not following him closely and only reading his blog
rarely -- did he work for Microsoft or something?   Still trying to
find where that connection comes in.

> realize that our groff cames with a set of ancillary tools like "pic", to be a
> very good job of doing technical drawings.  That's what Richard Stevens did, so
> don't argue that it's either impossible, or even difficult to do well.  If you
> argue this, please drop all the marketing propaganda, drop all references to

Richard Stevens is a highly technical network engineer.  He created
great figures as people often do with pic.  Whether you are using
groff or LaTeX or XML tools however you can hardly argue that manual
editing of a programming language is a better way to generate diagrams
than a graphical tool for most needs.  Sure I get great figures with
pic or pstricks but some of my best figures are drawn with OmniGraffle
in a fraction of the time.

> God, the amount of marketing crap that has gone out to push dynamic features
> (which web pages really do need) upon paper authors is impressive, but I never
> saw any use of this in any piece of fiction, or even in any technical
> dissertation, anything not destined for presentation via paper.  Many companies
> depend on this for their future, so I'm skeptical.

This seems to change the scope of your argument significantly.  If you
are now conceding the general usefulness of XML for things like
Handbook and only saying it is overkill for a paper-only document then
I'd tend to agree.  I'd go straight to LaTeX, but many would go
straight to groff.  To each his own.  Kind of makes me wonder why all
the ranting about newegg.com and microsoft and evil xml vendors.

> Show me a book which needs these features, a book that would be better written

Any book published by O'Reilly -- because they need to publish not
just PDFs but hyperlinkable electronic book versions in addition to
dead tree versions.

> to a complicated monstrosity (which our handbook did) was a mistake in strategy,
> trying to go the popular way.  Making it so the number of folks who can format
> the sources is limited to folks which have the resources.

You've done a very poor job in this rant of pointing out any
advantages to using groff for the Handbook over DocBook XML, and I
think you know this which is why you sent it to an unrelated list.  I
will grant you that you listed one solid advantage in this mail.  The
tools use less disk space.

I assure you that your points will be discussed and listened to if you
try again without all the ranting and weird logical fallacies and if
you post it to the appropriate place, doc at FreeBSD.org.

Thanks!

                    - Murray


More information about the freebsd-chat mailing list