Sunday, 7 January 2018

Computer Science Internationalization - Presentation of Links

All of us encounter links in documents, webpages and email. They are readily identified as they are usually coloured blue if not visited and red if visited. In the case of IDNs (Internationalised Domain Names) there are several ways of presenting links to users and I have encountered all of these variants. Letʼs look at Korea University's IDN 고려대학교.한국. This can be presented as www.고려대학교.한국, http://고려대학교.한국, http://www.고려대학교.한국, www.xn--299a9hr4mn4fgs6b.xn--3e0b707e, http://xn--299a9hr4mn4fgs6b.xn--3e0b707e, http://www.xn--299a9hr4mn4fgs6b.xn--3e0b707e. If you click on these links you will see that they all work. I do not like any of these ways of presenting links. xn--299a9hr4mn4fgs6b.xn--3e0b707e is the punycode form of the domain name and should never be presented to users. It is used for behind the scenes communication between internet devices. (I still prefer the name punnycode 😁 )

I have used different forms over the years and I do consider there is a best way of presenting links and I have done this on many occasions. But I have mostly done it by way of experimentation and I have not done it consistently. As of yesterday, I have decided to have a consistent working practice for presenting both IDN and ASCII links. My personal rules for presenting links are:-

  1. I will not use the www or http(s) prefix and will most definitely not use the punycode form. The link for Korea University now becomes ➜ 고려대학교.한국. We now have a simple and elegant presentation of the link. Note also that it is a single human language script which in this case is Korean Hangeul. Therefore when one is typing this link there is no need to switch between English and Korean on oneʼs device. This, to me, is the most important part of presenting links as a single human language script.
  2. Presentation of email addresses is well established and presented email address links are not prefixed with the "mailto://" scheme name. We can do the same thing with internationalised email addresses. The link for my Chinese email address is 小山@电邮.在线?Subject=你好小山😜. We can easily distinguish between website links and email links because an email link has the @ symbol. If your email client works correctly, then, when you click my Chinese email link the To: field should be filled in with 小山@电邮.在线 and the Subject: field with 你好小山😜
  3. I will always show the real address in the link eg고려대학교. I will never use anything of the form " … please click here for further information. ". I consider this to be extremely bad security. I abandoned this practice many years ago. How many of you hover over a link to determine the real address before you click the link? I do sometimes, but mostly I do not. With my links there are no surprises, what you see is what you will get. The one thing I have no control over is redirection. A website can, at anytime, redirect to a different web address. Such redirection does sometimes happen though not very often and usually it is for legitimate reasons such as redirection to a new version of a website.
  4. When the url is extremely long which I cannot reasonably fit into say a presentation slide, I will use ellipsis to indicate this is not the complete address eg once.upon/a/time/there/was/a/beautiful/…
  5. So far we have only considered the two most common schemes, http(s) and mailto. There are many other schemes, such as smb, sftp and imap. There is a list of the registered schemes at In cases like this I will include the scheme prefix in my link so that it can be easily distinguished from the aforementioned type of links eg sftp://some.fileserver.somewhere/freestuff/user-manual.txt"

Some systems and apps will break my working practice as they will not allow me to present my links as I consider they should be presented. I will endeavour to find work arounds for such systems.

Techie Tip: When I was setting up my Chinese email address in my email signature, my email client insisted on decoding the domain name 电邮.在线 to the punycode form xn--wny099c.xn--3ds443g. When something like this happens I add an extra level of encoding so that the system decodes to the level I want not the level the system wants. This sometimes works and sometimes not. In this case it worked. I added percent encoding. I percent encoded to %E5%B0%8F%E5%B1%B1@%E7%94%B5%E9%82%AE.%E5%9C%A8%E7%BA%BF and then my email client decoded to 小山@电邮.在线 which is precisely what I wanted. A really useful web app for doing such conversions is Richard Ishidaʼs Unicode code converter