Please review, small SGML entity cleanup

Gabor Kovesdan gabor at FreeBSD.org
Sat Dec 29 13:00:59 UTC 2012


On 2012.12.29. 13:48, Ulrich Spörlein wrote:
> On Sat, 2012-12-29 at 12:08:46 +0100, Gabor Kovesdan wrote:
>> >On 2012.12.28. 18:14, Ulrich Spörlein wrote:
>>> > >The DE and FR articles are a hodgepodge of SGML entities and direct,
>>> > >8bit chars, with the former being the majority. This patch cleans this
>>> > >up a little, although we should eventually switch this all to UTF-8,
>>> > >obviously.
>> >Don't they work with direct chars? Once we made a step to that direction
> They probably will, and I have no clue why we used entities for German
> and French, but the usual encodings for Russian and Japanese, etc.
>
>> >so this one would be one step back. If possible, it would be better to
>> >convert the entities to direct chars instead of the opposite.
> In the end, sure. But that's a larger project of moving from
> de_DE.ISO8859-1 -> de_DE (with an implied UTF-8 encoding, as is required
> by XML anyway, the implied part, not the exact encoding).
It isn't required by XML. If you omit the encoding part of the XML 
declaration, the content is treated as UTF-8 but it is not a requirement 
at all.
>
> I don't think this commit is a step back, because the documents need to
> be converted using a long series of s/ü/ü/g, anyway. And the
> current mish-mash is just weird.
I don't see any reason why we cannot do this conversion right now in 
ISO-8859-1. (Actually, I did, but people kept introducing new redundant 
entities.) ISO-8859-1 isn't any harder to type than UTF-8. If you want 
consistency (which imho isn't that important at this point since there 
are lots of upcoming changes) then why not move to the right direction 
of consistency?

Gabor


More information about the freebsd-doc mailing list