Tidy and HTML tab spacing

Hiroki Sato hrs at FreeBSD.org
Wed Jan 18 23:45:55 UTC 2012


Warren Block <wblock at wonkity.com> wrote
  in <alpine.BSF.2.00.1201181520140.40712 at wonkity.com>:

wb> HTML versions of FreeBSD documents are fed through tidy (www/tidy or
wb> www/tidy-devel) for cleanup.  There's a bug in tidy[1] that can cause
wb> tab stops to be wrong:
wb> http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623
wb>
wb> Note how DISTNAME and EXTRACT_SUFX do not line up.  They are correct
wb> in the source book.sgml.
wb>
wb> So what to do?

 I lean to fixing Tidy if possible.  The reason why we are using Tidy
 is to fix mark-ups in rendered results from various tools like Jade,
 not (only) for human-readability.  The results of Tidy are still not
 perfect from viewpoint of standard conformance, but it is better than
 nothing even if most of modern www browsers can handle the rendered
 HTMLs directly.

 It is known that there are some problems with entity dereference and
 white-space handling as you also pointed out.

wb> 3. Tidy could be replaced with some other tool.  However, the others

 Although I tried xmlindent, xmlformat, and xmllint as a replacement
 in the past, they were indended for well-formed XML docs and not
 enough for fixing malformed (sometimes broken) mark-ups.

wb> 4. Add newlines to the HTML in the build process before it gets to
wb>    tidy:
wb>      s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/

 I think this will break the results because a newline just after ">"
 is recognized as CDATA.

wb> 5. Don't tidy HTML files at all (suggested as an option by Benedict
wb>    Reuschling).  The unprocessed HTML is ugly, but few people are going
wb>    to look at it directly.  Files that haven't been through tidy are a
wb>    little larger, about 4% in the case of the Porter's Handbook.

 To eliminate Tidy we have to improve standard conformance of the
 rendered results.  I do not know the recent situation precisely
 because I investigated it seven years ago, but I think it still has
 some glitches.

-- Hiroki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20120119/6bf56486/attachment.sig>


More information about the freebsd-doc mailing list