Please review, small SGML entity cleanup

Sat Dec 29 13:00:59 UTC 2012

On 2012.12.29. 13:48, Ulrich Spörlein wrote:
> On Sat, 2012-12-29 at 12:08:46 +0100, Gabor Kovesdan wrote:
>> >On 2012.12.28. 18:14, Ulrich Spörlein wrote:
>>> > >The DE and FR articles are a hodgepodge of SGML entities and direct,
>>> > >8bit chars, with the former being the majority. This patch cleans this
>>> > >up a little, although we should eventually switch this all to UTF-8,
>>> > >obviously.
>> >Don't they work with direct chars? Once we made a step to that direction
> They probably will, and I have no clue why we used entities for German
> and French, but the usual encodings for Russian and Japanese, etc.
>
>> >so this one would be one step back. If possible, it would be better to
>> >convert the entities to direct chars instead of the opposite.
> In the end, sure. But that's a larger project of moving from
> de_DE.ISO8859-1 -> de_DE (with an implied UTF-8 encoding, as is required
> by XML anyway, the implied part, not the exact encoding).
It isn't required by XML. If you omit the encoding part of the XML 
declaration, the content is treated as UTF-8 but it is not a requirement 
at all.
>
> I don't think this commit is a step back, because the documents need to
> be converted using a long series of s/ü/ü/g, anyway. And the
> current mish-mash is just weird.
I don't see any reason why we cannot do this conversion right now in 
ISO-8859-1. (Actually, I did, but people kept introducing new redundant 
entities.) ISO-8859-1 isn't any harder to type than UTF-8. If you want 
consistency (which imho isn't that important at this point since there 
are lots of upcoming changes) then why not move to the right direction 
of consistency?

Gabor