Monday, 9 January 2017

Chinese Email Address

The latest and hottest news is that I now have a Chinese email address➜ 小山@电邮.在线 😄

  1. 小山 is my adopted Chinese name
  2. 电邮 means email
  3. 在线 means online

I acquired my free Chinese email address from DataMail which supports email addresses in twelve languages: العَرَبِيَّة‎‎ Arabic, বাংলা Bengali, 中文 Chinese, English, ગુજરાતી Gujarati, हिन्दी Hindi, मराठी Marathi, ਪੰਜਾਬੀ Punjabi, ру́сский Russian, தமிழ் Tamil, తెలుగు Telugu, اُردُو‎ Urdu.

Additionally, DataMail has an impressive family of IDNs (Internationalized Domain Names) with each language having itʼs own IDN.
  1. Arabic داده.امارات
  2. Bengali ডাটামেল্.ভারত
  3. Chinese 电邮.在线
  4. English datamail.in
  5. Gujarati ડાટામેલ.ભારત
  6. Hindi डाटामेल.भारत
  7. Marathi डेटामेल.भारत
  8. Punjabi ਡਾਟਾਮੇਲ.ਭਾਰਤ
  9. Russian почта.рус
  10. Tamil இந.இந்தியா
  11. Telugu డేటామెయిల్.భారత్
  12. Urdu ڈاٹامیل.بھارت

If you would like your own DataMail email address in one of the above languages then just click one of the above links. The website directs you to download an Android or iOS App. One uses the App to actually register a DataMail email address.

The main points in the registration process using the DataMail App are:

  1. The crucial part of this process is that firstly you need to select the language for the email address you are about to register. Subsequent instructions will be in the language you have selected. So, I chose Chinese in order to register 小山@电邮.在线.
  2. Validation of your phone number - the DataMail App will, with your approval, send an SMS text to DataMail in India to confirm your phone number. If the validation process fails, it could be that your phone contract does not cover the sending of international SMS text.
  3. Choosing the local-part which in my case is 小山. The Domain Name part is fixed and is provided by DataMail. There is a Domain Name per language, as above.

I have successfully exchanged emails between Gmail ASCII emails addresses and my DataMail Chinese email address. Gmail supports Internationalized Email Addresses (IEAs) but one cannot create IEAs in Gmail. DataMail, to my knowledge, is currently the only production email system that both supports and allows creation of IEAs.

In addition to the App, DataMail can be used with a web bowser ➜ 邮.电邮.在线

Currently, the few systems supporting internationalized email addresses are DataMail, Gmail and Outlook 2016. So, what to do when exchanging email with a system that only supports ASCII email addresses? DataMail have thought about this issue and offer email aliasing. One can create ASCII email aliases and use them to exchange email with systems that do not support international email addresses. My DataMail mailbox has the Chinese email address 小山@电邮.在线 and ASCII @datamail.in addresses thus allowing me to communicate with any email system.

DataMail is a good example of an AI (Adaptive Internationalized) website. It adapts to the language of the web address used for access. The most obvious adaptation is the text content is in the language of the web address. Secondly, the appropriate language button is highlighted. Finally, and perhaps less obviously, in the top right corner there is a DataMail support email address which is in the current web address language. In the case of 电邮.在线 the DataMail support email address is 支持@电邮.在线

Letʼs examine some of the technicalities of IEAs. The structure of an email address is local-part@Domain Name where the Domain Name identifies a mail server and local-part identifies a mailbox on said mail server. The email addresses you will be most familiar with are ASCII local-part@ASCII Domain Name. IEAs, on the other hand, are of the form Unicode local-part@Unicode Domain Name. In order to make this form work we need to encode both parts with one encoding for the Unicode local-part and a different encoding for the Unicode Domain Name. The encoded email address is UTF-8@punycode. Users see the Unicode email address and Computers work with the encoded address.

Thursday, 15 December 2016

grep highlighting

I frequently use grep to demonstrate and explain regular expressions (regex). I use it in interactive mode with the input coming from the keyboard and the output going to the screen. So, I type some string and if grep finds a match this input string is echoed to the screen. If no match is found then this input string is not echoed to the screen. I have used this teaching method for many years.

Recently, whilst using CentOS, I discovered that grep can highlight matched strings. The CentOS machine I used was setup with grep highlighting which is how I discovered it. I was impressed as it makes it clear exactly which text is matched.

My Mac OSX does not have grep highlighting with the default settings. I therefore decided to configure my OSX system so it does highlight grep matches as it is so useful. Rather than having to repeatedly type the relevant grep otions on the command line, I put them into my .bash_profile, as follows.

export GREP_OPTIONS='--color=auto'
export GREP_COLOR='1;34' # 1=bold; 34=blue

I now give a grep terminal session extract which illustrates non matching and matching.

苹果电脑 ~: egrep '노팅엄'
안산 안양 부산 구미 제주 포항 양산
안산 안양 부산 노팅엄 구미 제주 포항 양산
안산 안양 부산 노팅엄 구미 제주 포항 양산

The text used in this terminal session is Korean Hangeul. Each word is a Korean city, apart from 노팅엄 which is Nottingham, a city in England. The Korean cities are: 안산 Ansan, 안양 Anyang, 부산 Busan, 구미 Gumi, 제주 Jeju, 포항 Pohang and 양산 Yangsan.

Note: I use egrep as it is short form for grep -E which enables extended regular expressions.

Environment: OSX Sierra v10.12.1

Saturday, 26 November 2016

Domain Name Registrations

To keep up to date with Domain Name registrations I highly recommend gd-domains. It gives listings of newly registered Domain Names on a daily basis. Listings for individual TLDs (Top Level Domains) are available. It is thanks to this site that I discovered the below impressive and sizeable family of Korean Domain Names. They were all registered on 22nd November 2016, 2016년 11월 22일 화요일. The TLD used is 닷컴 which is Verisign's Korean equivalent to com.

I think embedding telephone numbers into these IDNs (Internationalized Domain Names) is clever marketing ☎️

  1. 남양주용달이사-010-3126-0853.닷컴
  2. 원룸반포장이사-010-3126-0853.닷컴
  3. 마포포장이사-010-3126-0853.닷컴
  4. 강동구이사-010-3126-0853.닷컴
  5. 강서구포장이사-010-3126-0853.닷컴
  6. 광진구원룸이사-010-3126-0853.닷컴
  7. 광진구이사짐센터-010-3126-0853.닷컴
  8. 송파구포장이사-010-3126-0853.닷컴
  9. 중랑구원룸이사-010-3126-0853.닷컴
  10. 서초구원룸이사-010-3126-0853.닷컴
  11. 송파구용달센터-010-3126-0853.닷컴
  12. 학생이사-010-3126-0853.닷컴
  13. 사당동원룸이사-010-3126-0853.닷컴
  14. 지방용달가격-010-3126-0853.닷컴
  15. 싼곳용달이사-010-3126-0853.닷컴
  16. 마포용달이사-010-3126-0853.닷컴
  17. 반포장이사-010-3126-0853.닷컴
  18. 서울일반이사-010-3126-0853.닷컴
  19. 1톤트럭이사-010-3126-0853.닷컴
  20. 1톤소형이사-010-3126-0853.닷컴
  1. 마포원룸이사-010-4675-2414.닷컴
  2. 강동구용달이사-010-4675-2414.닷컴
  3. 강서구용달이사-010-4675-2414.닷컴
  4. 강동구지역이사-010-4675-2414.닷컴
  5. 서울개인용달이사-010-4675-2414.닷컴
  6. 광진구용달이사-010-4675-2414.닷컴
  7. 송파구원룸이사-010-4675-2414.닷컴
  8. 동작구용달이사-010-4675-2414.닷컴
  9. 중랑구용달이사-010-4675-2414.닷컴
  10. 송파구용달이사-010-4675-2414.닷컴
  11. 서초구용달이사-010-4675-2414.닷컴
  12. 서울소형이사-010-4675-2414.닷컴
  13. 오피스텔이사-010-4675-2414.닷컴
  14. 지방용달이사-010-4675-2414.닷컴
  15. 용산용달이사-010-4675-2414.닷컴
  16. 용달차이사-010-4675-2414.닷컴
  17. 합정동용달이사-010-4675-2414.닷컴
  18. 서울반포장이사-010-4675-2414.닷컴

www.gd-domains.com/20161122-229 is the direct link for all 닷컴 registrations on 22nd November 2016, 2016년 11월 22일 화요일.

Wednesday, 26 October 2016

Family of Korean IDNs

The following is a list of functioning Korean IDNs (Internationalized Domain Names). They all belong to the same Computer Repair Company. The TLD (Top Level Domain) used is 닷컴 which is Verisign's new Korean language equivalent to their com TLD. Each IDN contains 컴퓨터수리 which means Computer Repair. The only difference between these IDNs is the first two characters which are the names of Korean cities. I think this is clever and creative use of IDNs!

The last two IDNs below are structured differently. The first two characters are, I think, a neighbourhood and the first two characters after the hyphen are the city.
  1. 시흥컴퓨터수리.닷컴
  2. 부천컴퓨터수리.닷컴
  3. 창원컴퓨터수리.닷컴
  4. 마산컴퓨터수리.닷컴
  5. 평택컴퓨터수리.닷컴
  6. 오산컴퓨터수리.닷컴
  7. 진해컴퓨터수리.닷컴
  8. 김해컴퓨터수리.닷컴
  9. 북동컴퓨터수리-창원컴퓨터수리.닷컴
  10. 우동컴퓨터수리-부산컴퓨터수리.닷컴

Friday, 7 October 2016

Computer Science Internationalization — Bidi

Scripts such as Latin are written from Left to Right (L➡︎R). Scripts such as Arabic and Hebrew are written Right to Left (L⬅︎R). What happens when we mix L➡︎R and L⬅︎R scripts within a document? Here is an exercise in mixing scripts.

Take a mixed bidi (bidirectional) string consisting of Latin and Hebrew characters in a L➡︎R paragraph.

abcאבגdef

...and here is the same string in a L⬅︎R paragraph.

abcאבגdef

Now to the actual exercise. Copy the above stings to your text editor or word processor. You will need to setup the 2nd occurrence of the string as a L⬅︎R paragraph. I am assuming that your directionality is L➡︎R by default. Each string has two boundaries where the text changes direction. For each boundary you are going to insert a character, either a L➡︎R, such as x, or a L⬅︎R, such as ד. For each insertion operation use the initial mixed bidi string. There are two mixed strings above and so there are a total of 8 insertion operations. The challenge is to predict where in the strings the inserted character will appear before you actually insert the character. Give it a go! Good luck😀

If I did this exercise before I ever studied bidi, I would probably have scored 4/8. Now I understand how the computer is processing this bidi text and so I usually score full marks for such exercises. It is though not an intuitive process for me as I have spent most of my life reading and writing L➡︎R scripts only. I have to think very carefully as to how the computer does it in order to determine the correct answers.

The main purpose of this exercise is to think about the ordering of the characters in the strings. There are two orderings to consider: memory order and display order. Memory order is how it is logically saved in memory which in this case is the order in which I typed it. The memory order of the string I have used above is "abcגבאdef". Display order is how it is presented to the viewer. You have already seen, above, the two possible display orders for the single string "abcגבאdef".

I have used TextEdit for this exercise. In order to set paragraph text direction in TextEdit follow the path: "TextEdit➜ Format➜ Text➜ Writing Direction". Now set paragraph text direction to Right to Left. TextEdit correctly handles bidi text but that is not the case for all word processors or text editors.

There are several permutations of this exercise, including:

  1. What happens at the boundaries with forward delete and back delete?
  2. What happens if the initial memory order character(s) are L⬅︎R instead of L➡︎R?
  3. Use Arabic instead of Hebrew as this introduces the additional challenge of letters changing shape according to preceding and following characters.

This article is aimed at L➡︎R reading/writing people. If you are a L⬅︎R person then you will need to invert some of my instructions. Actually, if you are a L⬅︎R person you will be totally familiar with mixing bidi text and so will fully understand this exercise.

Environment: OSX v10.12 (Sierra), TextEdit v1.12

Wednesday, 21 September 2016

Computer Science Curriculum Internationalization

I have been a long time practitioner and advocate of internationalising Computer Science teaching. My fundamental aim is to give students global computing skills. One such global skill, for example, is the processing of Unicode text rather than the very restricted ASCII text. Once one encompasses Unicode then one is encompassing most languages and scripts of the world.

Over the years I have tried to find other like minded Computer Science educators but have had no success. I had more or less concluded I am a solitary voice when it comes to Computer Science internationalisation. There does though appear to be some light as I recently discovered two organisations that promote internationalisation of teaching curricula.

🌍 The Centre for Curriculum Internationalisation (CCI) which is based at Oxford Brookes University, UK. brookes.ac.uk/services/cci/index.html In addition to their website they have a google discussion group. I posted some information on my Computer Science Internationalisation initiatives and practices to this google forum. Please see groups.google.com/forum/#!topic/cicin/6XJCrqcdLD4

🌏 Internationalisation of the Curriculum (IoC) in action which is based in Australia. ioc.global/index.html

Thursday, 25 August 2016

Internationalizing Regular Expressions

The purpose of this post is to encourage all of you who are teaching Regular Expressions (RegExp) or are learning RegExp to think international. Think beyond ASCII. Thinking international means thinking Unicode instead of ASCII. Once one thinks Unicode then one is encompassing the world.

My RegExp teaching slides use ASCII only as a starting point. They then progress to Unicode. I give one of my slides as an example.


There is a lot of information packed into this one slide which needs some explanation. My example slide is using Unicode Chinese characters and Unicode Emoji characters.
  • 人 is a Unicode Chinese character meaning person
  • 鸭 is a Unicode Chinese character meaning duck
  • 鸡 is a Unicode Chinese character meaning chicken
This slide also contains a cultural reference. Some time ago I came across a Weibo 微博 post about the visit to Hong Kong by the big floating yellow duck http://edition.cnn.com/2013/05/02/travel/hong-kong-giant-duck/ The Weibo post had a photo containing many people looking at the duck. The text of the Weibo post was:- 

人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人鸭人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人人

 When I saw this I thought it so funny and very clever. It just would not work in English but works so perfectly in Chinese. When writing my RegExp slides I remembered this Weibo post and thought this would make for an excellent cultural connection. Thus my slide is internationalized by using Unicode and incorporating a cultural reference. The use of Unicode is essential for internationalisation. Incorporating a cultural reference is optional but it does add an extra dimension that may well serve to make RegExp slides more interesting and encourage readers to explore the boundless potential of internationalized Regular Expressions.

 Note: I have tried to find the Weibo post but have been unsuccessful so I cannot, unfortunately, provide a reference.