Tuesday, 31 January 2017

Computer Science Internationalization - Unicode Terminal Session

Below is an OSX bash shell command line terminal session. It is a real, working terminal session using basic unix commands. It does, though, look significantly different from a standard terminal session. If you know basic unix commands such as ls and cd, you should/may be able to work out what is happening.

苹果电脑 ~: 妈 我的目录
苹果电脑 ~: 茶 我的目录
苹果电脑 我的目录: 丽
苹果电脑 我的目录: 头 文档一 文档二 文档三
苹果电脑 我的目录: 丽
文档一  文档三  文档二
苹果电脑 我的目录: 词 > 文档四
苹果电脑 我的目录: 词 文档四
苹果电脑 我的目录: 丽
文档一  文档三  文档二  文档四
苹果电脑 我的目录: ⇉ 文档四 文档五
苹果电脑 我的目录: 丽
文档一  文档三  文档二  文档五  文档四
苹果电脑 我的目录: → 文档一 文档六
苹果电脑 我的目录: 丽
文档三  文档二  文档五  文档六  文档四
苹果电脑 我的目录: 

So, what is happening!?

Firstly I am using Unicode characters. If you search the internet you will find many examples of terminal sessions but they will invariably be using ASCII characters only. In my above terminal session I am using Unicode characters, mostly Chinese/Japanese and two arrow symbol characters.

Where are the commands such as ls and cd? I have mapped a set of commands to Unicode characters using the alias command eg alias 丽='ls'

I have changed the command line prompt.

If you understand basic bash commands, I believe I have now given you sufficient information in order for you to work out what is happening in the terminal session. Knowing Chinese or Japanese gives a slight advantage but it is not essential to understanding this terminal session. The Chinese/Japanese characters I chose for the command mappings are somewhat random so it will not help you to google translate them.

I actually devised these command mappings and the terminal session several years ago. Today, I decided it was time to put it onto my blog. My main purpose was and still is, to encourage students to think beyond ASCII. I believe it has impact because it is so unexpected when one first sees this terminal session.

There can be many different permutations on the session using different human language scripts and unicode symbols. It makes for an interesting and unusual exercise for students studying unix. Absolutely no reason why one should not, for example, use emoji for the command mappings.

Monday, 9 January 2017

Chinese Email Address

The latest and hottest news is that I now have a Chinese email address➜ 小山@电邮.在线 😄

  1. 小山 is my adopted Chinese name
  2. 电邮 means email
  3. 在线 means online

I acquired my free Chinese email address from DataMail which supports email addresses in twelve languages: العَرَبِيَّة‎‎ Arabic, বাংলা Bengali, 中文 Chinese, English, ગુજરાતી Gujarati, हिन्दी Hindi, मराठी Marathi, ਪੰਜਾਬੀ Punjabi, ру́сский Russian, தமிழ் Tamil, తెలుగు Telugu, اُردُو‎ Urdu.

Additionally, DataMail has an impressive family of IDNs (Internationalized Domain Names) with each language having itʼs own IDN.
  1. Arabic داده.امارات
  2. Bengali ডাটামেল্.ভারত
  3. Chinese 电邮.在线
  4. English datamail.in
  5. Gujarati ડાટામેલ.ભારત
  6. Hindi डाटामेल.भारत
  7. Marathi डेटामेल.भारत
  8. Punjabi ਡਾਟਾਮੇਲ.ਭਾਰਤ
  9. Russian почта.рус
  10. Tamil இந.இந்தியா
  11. Telugu డేటామెయిల్.భారత్
  12. Urdu ڈاٹامیل.بھارت

If you would like your own DataMail email address in one of the above languages then just click one of the above links. The website directs you to download an Android or iOS App. One uses the App to actually register a DataMail email address.

The main points in the registration process using the DataMail App are:

  1. The crucial part of this process is that firstly you need to select the language for the email address you are about to register. Subsequent instructions will be in the language you have selected. So, I chose Chinese in order to register 小山@电邮.在线.
  2. Validation of your phone number - the DataMail App will, with your approval, send an SMS text to DataMail in India to confirm your phone number. If the validation process fails, it could be that your phone contract does not cover the sending of international SMS text.
  3. Choosing the local-part which in my case is 小山. The Domain Name part is fixed and is provided by DataMail. There is a Domain Name per language, as above.

I have successfully exchanged emails between Gmail ASCII emails addresses and my DataMail Chinese email address. Gmail supports Internationalized Email Addresses (IEAs) but one cannot create IEAs in Gmail. DataMail, to my knowledge, is currently the only production email system that both supports and allows creation of IEAs. I used Gmail with a browser when testing exchange of IEAs. If you are accessing your Gmail using IMAP or POP then IEAs may or may not work. It all depends on whether or not your client software supports IEAs.

I have sent email from DataMail using my Chinese email address 小山@电邮.在线 to several Gmail users. My current experience is that for some of the Gmail users, my email goes to their spam folder instead of their primary inbox. If this is happening to you or your recipients, please mark the Gmail email as 'not spam' to help prevent reoccurrences of this problem.

In addition to the App, DataMail can be used with a web bowser ➜ 邮.电邮.在线

Currently, the few systems supporting internationalized email addresses are DataMail, Gmail and Outlook 2016. So, what to do when exchanging email with a system that only supports ASCII email addresses? DataMail have thought about this issue and offer email aliasing. One can create ASCII email aliases and use them to exchange email with systems that do not yet support international email addresses. My DataMail mailbox has the Chinese email address 小山@电邮.在线 and ASCII @datamail.in addresses thus allowing me to communicate with any email system.

DataMail is a good example of an AI (Adaptive Internationalized) website. It adapts to the language of the web address used for access. The most obvious adaptation is the text content is in the language of the web address. Secondly, the appropriate language button is highlighted. Finally, and perhaps less obviously, in the top right corner there is a DataMail support email address which is in the current web address language. In the case of 电邮.在线 the DataMail support email address is 支持@电邮.在线

Letʼs examine some of the technicalities of EAI (Email Address Internationalization). The structure of an email address is local-part@Domain Name where the Domain Name identifies a mail server and local-part identifies a mailbox on said mail server. The email addresses you will be most familiar with are ASCII local-part@ASCII Domain Name. IEAs, on the other hand, are of the form Unicode local-part@Unicode Domain Name. In order to make this form work we need to encode both parts with one encoding for the Unicode local-part and a different encoding for the Unicode Domain Name. The encoded email address is UTF-8@punycode. Users see the Unicode email address and Computers work with the encoded address.

For further technical reading, these are the primary EAI RFCs:

  1. tools.ietf.org/html/rfc6531
  2. tools.ietf.org/html/rfc6532
  3. tools.ietf.org/html/rfc6533
  4. tools.ietf.org/html/rfc6534