ICANN votes to expand domain name character set

Doug Barton dougb at FreeBSD.org
Fri Jun 27 05:41:34 UTC 2008

Tim Clewlow wrote:
> Hi there,
> In case you haven't heard yet, ICANN have unanimously voted their 
> approval to expand the domain name character set to include Asian, 
> Middle Eastern, Eastern European and Russian character sets in domain
>  names.

That's already possible at the second level and above through IDN. Check
out ftp://ftp.rfc-editor.org/in-notes/rfc3491.txt and 
ftp://ftp.rfc-editor.org/in-notes/rfc3492.txt. In short, the client
software that deals with IDNs is required to make the translation from
"International" characters to punycode strings before sending the dns 
request, so in an ideal world nothing below the client layer will have 
to change. So far the world has been more or less ideal, depending on 
where you sit. :)

The actual change that's being announced now is the approval of IDN
strings at the top level. Conceptually this is the same mechanism. But
the "layer 9" stuff make this really interesting/complicated/annoying,
once again depending on where you sit.

I was involved in a lot of IDN stuff when I was at ICANN running the
IANA, so if anyone wants more details let me know, I can go on for hours.

> In addition, top level domains will have their restrictions removed, 
> ie any non-offensive top level domains will now be allowed.

That's not _quite_ true. The restriction of two-letter domains for
country codes will still be in place, and there is some protection for
trademark holders, etc.

> I'm guessing the inclusion of the new character sets will mean a fair
>  amount of alteration to code that processes domain names.

Client code, yes. In a lot of ways FreeBSD is behind the curve on this,
since we really should have been building (more) punycode translation
capability into our client software already. The good news is that on
the software level if you can do that for the second level and above
it's pretty easy to do it for TLDs. The more interesting problem there
is a lot of ancient software, web scripts, etc. with hard-coded rules 
about how TLDs only have 3 characters ....




