Saturday, 11 June 2011

Twitter i18n

There are a number of internationalization (i18n) issues with Twitter. I use Twitter directly from a browser. If you use a Twitter client then your experiences could differ from mine. A client could, for instance, reconstruct Unicode Domain Names from a Punycode form. I deliberately use twitter from a browser so that I can determine what is happening at base level. Additionally I do not use any twitter related browser extensions. So now to the i18n twitter issues that I have so far encountered.


If I use an ASCII hashtag then it works as expected. If though I use non ASCII Unicode characters then twitter does not recognize it as a hashtag. eg #loughborough works but #ラフバラ does not.

Sina's Microblog 新浪微博, unlike twitter, does have Unicode hashtags. Twitter uses a single # character as a prefix to the text. 新浪微博 uses a pair of # characters to bracket the text. By way of example, #ラフバラ# is a valid Unicode hashtag which I have used on my 新浪微博

ラフバラ is loughborough written in Japanese Katakana.


Several countries now have functioning idn ccTLDs. A recently live ccTLD is Korea's dot 한국. Lets take the new IDN for Songpa District Office, Seoul, Korea. Their IDN should show on twitter as a live clickable link ie 송파구청.한국. Twitter, though, does not process it as a valid web address and so it just shows as plain text on twitter ie http://송파구청.한국/

If, though, I use an IDN with an ASCII TLD then twitter recognizes it as a valid web address and displays it as a live clickable link. See

Auto Shortening

Twitter are rolling out an automatic link shortening service. Link shortening (currently) kicks in at 11 characters. The character count includes the http:// prefix. is exactly 11 characters and one cannot have a shorter link. Therefore all tweet links will be shortened. Some time since yesterday I have been rolled in and so now my links are auto shortened by twitter. Not a problem for ASCII links but a new problem arises with IDN links. The punycode form is displayed in the tweet instead of the unicode form. See

Instead of displaying twitter should display the unicode form 송파구청.kr. The underlying href link, as created by twitter, is


Lets now add a pathname part to a Domain Name. Add an ASCII pathname part and all is fine. Take an ASCII Domain Name and add a non ASCII Unicode pathname and twitter fails to recognize the full URL.

In a tweet I want러프버러 to appear as러프버러 Instead it appears as ta.gd러프버러 The ASCII Domain Name is processed as a link but the Unicode pathname part is not being treated as part of the URL and is just displayed as plain text. One ends up with an incomplete URL.