How to handle localized characters ans special symbols?

Kövesdán Gábor gabor.kovesdan at t-hosting.hu
Mon Feb 6 10:49:00 UTC 2006


Simon L. Nielsen wrote:

>On 2006.02.04 20:33:54 +0100, Kövesdán Gábor wrote:
>
>  
>
>>I'm translating the FreeBSD webpage to Hungarian. I haven't done too 
>>much so far, because I don't have too much spare time, but I'll finish 
>>this translation. Today, I made a test build. You can see this here:
>>http://tux.t-hosting.hu/data
>>The most part of it is still in English but there are some translated 
>>pages. The build succeeded quite good, I've found my mistakes easily and 
>>managed to build the site, but I have troubles with one of the localized 
>>characters. This is The o letter with two commas on it. Its standard 
>>html code is ő, but the sgml parser substitutes it with a Q char. I 
>>don't see why does it happen and don't know how to fix it. There are two 
>>more problematic characters, and they are ® and ™. They are 
>>also substituted in a wrong way. See:
>>http://tux.t-hosting.hu/data/about.html
>>You can notice the Z character with a ?? sign after the word Pentium and 
>>a " after Athlon.
>>How could I correctly display these characters? Please tell me what to 
>>do so that we have a nice Hungarian webpage. :)
>>
>>(I use Firefox and it selects the ISO-8859-2 Central European encoding 
>>automatically.)
>>    
>>
>
>I think the problem is that your web server forces a character set
>which prevents the character set in the HTML from taking effect:
>
>[simon at zaphod:~] fetch -o /dev/null -vv http://tux.t-hosting.hu/data/about.html | & grep Content-Type:
><<< Content-Type: text/html; charset=ISO-8859-2
>
>I'm not exactly sure how some of the other translations are handling
>using non ISO-8859-1, but since e.g. ja and ru translations use
>something which definitely isn't Latin characters I'm sure it can be
>done.  See how those translations changes the character set as needed.
>
>  
>
I've found out, it's not just about the charset used by the browser. The 
SGML parser substitutes ő with Q. If ő remained in the html 
files, the browser would display them correctly. I tried to put this to 
my Makefile, to override the default in web.bsd.mk, hoping that SGML 
parser will not make this unwanted substitution any more:
SGMLNORMOPTS= -d ${SGMLNORMFLAGS} -c ${CATALOG} -D ${.CURDIR} -biso-8859-2
But no use.

I get a new problem recently, too. According to 
http://www.w3.org/2003/entities/iso8879doc/isolat1.html the entities 
á é etc... are accepted standards in the XML language, but 
if I put these character into an .xsl file, e.g. index.xsl the web build 
will fail.

Anyway, I've realized if I simply write a character ő into the sgml 
sources it remaines good, but I don't know how standard and portable 
this solution is. I would like to make my work as standard and portable 
as it can be.
As for the Russian website, they just type their characters according to 
their charset, and I see strange chaarcters in the sources. It is 
definitely working, but isn't there some more elegant solution? Like 
á instead of á, é instead of é, etc...

Thanks,

Gabor



More information about the freebsd-doc mailing list