tidy flag

Hiroki Sato hrs at FreeBSD.org
Thu Feb 5 16:50:15 UTC 2004


Alexey Zelkin <phantom at FreeBSD.org.ua> wrote
  in <20040205063847.GA13136 at phantom.cris.net>:

phantom> On Wed, Feb 04, 2004 at 11:27:03PM +0100, Alex Dupre wrote:
phantom> > Ok, the question then becomes: is it possible to replace the -preserve 
phantom> > tidy-stable flag with the -numeric tidy-devel flag? Otherwise can you 
phantom> > send me a pratical example where -preserve is needed? We (Thierry Thomas 
phantom> > and me) will try ourself.
phantom> 
phantom> Well.  Try below html code with -preserve and without.  You'll see a
phantom> difference.  Actually most annoying things was a 'entity expansion', but
phantom> there were also some problems with non-ASCII symbols processing under
phantom> some conditions (but unfortunatelly i don't remember details).
phantom> 
phantom> <html>
phantom>   <body>
phantom>     NBSP -  
phantom>     COPY - ©
phantom>   </body>
phantom> </html>

 The problem is that the result of the expansion should depend
 on the html doc's charset/encoding.  For example, in euc-jp, ©
 should be {0x8f, 0xa2, 0xed}, but tidy always think it as 0xa9.
 And many browsers interpret © as a raw character in the html
 doc's charset (euc-jp, in this case).   , ©, &#183, and
 other >159 characters in euc-jp are different from iso-8859-*.

 While according to the XML specification it is unambiguous (&#xxx;
 is always interpreted as a Unicode character), I think it is better
 that entity is preserved as it is at the present moment.  Tidy does
 not know the relationship between euc-jp and Unicode, so a lot of
 Japanese docs will be broken without -preserve. 

-- 
| Hiroki SATO
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20040206/398d458e/attachment.sig>


More information about the freebsd-doc mailing list