Tuesday 20 February 2018

Computer Science Internationalization - Mandarin Chinese Tones

Standard Mandarin Chinese uses four tones marks for pronunciation: ¯ ´ ˇ `. Normally, one only encounters these tone marks with Chinese written in pinyin but there is absolutely no reason why these tone marks cannot be used with Chinese characters 汉字.

Letʼs use the sentence: Nottingham is the home of Robin Hood. In pinyin this would be written: nuò dīng hàn shì luó bīn hàn de gù xiāng. In Mandarin Chinese this would normally be written: 诺丁汉是罗宾汉的故乡.

Here are some more simple Chinese sentences with tone marks. I have made the text a little larger so you can see the tone marks more clearly.

For the four tone marks I am using Unicode Combining Diacritical Marks, specifically: U+0304 ¯ COMBINING MACRON, U+0301 ´ COMBINING ACUTE ACCENT, U+030C ˇ COMBINING CARON, U+0300 ` COMBINING GRAVE ACCENT. These diacritics combine with the immediately preceding character.

Some Chinese characters have different pronunciations (tones) with different meanings. 与 and 为 are two such characters. Now, suppose I am not sure which is the correct tone, which is highly likely as my knowledge of Chinese is only basic. A single base character can have more than one Unicode combining diacritical mark. So, when I am uncertain I can combine all the relevant diacritics and let people knowledgeable in Chinese decide which is the correct tone from the context. Letʼs take sentence 二 and apply multiple tone marks to 与 and 为.

Actually, I can imagine those knowledgeable in Chinese, using my multiple diacritics methodology illustrated in sentence 四, being able to write a sentence having multiple sensible meanings.

So, how to type the tone marks? Here is one method using OSX and the ABC - Extended keyboard. Firstly, type your Chinese text or copy paste some Chinese text. Now switch to the ABC - Extended keyboard. Place your cursor immediately after a Chinese character and then use one of the following key combinations.

  • first tone ¯, use the key combination: alt ⇧ A
  • second tone ´, use the key combination: alt ⇧ E
  • third tone ˇ, use the key combination: alt ⇧ V
  • fourth tone `, use the key combination: alt ⇧ grave

Repeat for each Chinese character in your text, excepting those that do not have a tone mark. 的 when used as a possessive particle does not have a tone mark. If your OSX system is not setup for the ABC - Extended keyboard, go to System Preferences ➜ Keyboard ➜ Input Sources, and click + to add the ABC - Extended keyboard.

Here is a classic tongue twister: māma qímǎ, mǎ màn, māma mà mǎ.

There is much variance in how well or how badly browsers, word processors and text editors display Chinese characters with combining diacritical marks. Over the years, many times, I have found that TextEdit succeeds where other word processors and text editors fail. I have used TextEdit to produce the above five Chinese sentences and included them as images.

With html documents we can use ruby annotation and CSS to combine Chinese characters and tone marks. Here is sentence 三 rewritten using ruby annotation and CSS.

  1. ˍˎˏ ̬ˍˏˎˍˎˏˍˎ

英国,诺丁汉市,秋月茶 ➜ augustmoontea.com

Environment: OSX High Sierra version 10.13.2

Friday 16 February 2018

Computer Science Internationalization - Time Zones

Yesterday, I wrote some code especially for the Chinese New Year 狗年. Look at the date in the last text box at jsfiddle.net/coas/zvubxato. I wanted to test that my code worked correctly for browsers in different time zones. I am in the UK so my time zone is currently GMT+0000. It is actually really easy to change time zone in OSX.

Go to System Preferences ➜ Date & Time ➜ Time Zone. Change of time zone is live and immediate. No need to close the time zone window nor restart your Mac. In the below screen shot I have selected Australian Eastern Daylight Time. Using JavaScript Date() I get the output Fri Feb 16 2018 22:05:34 GMT+1100 (AEDT) . This easy changing of time zones made testing of my code very easy and I did test my code with many different time zones. Why did I test with so many time zones? Well, because it was fun exploring time zones round the world😀 Did you know, for instance, that North Korea 조선민주주의인민공화국 and South Korea 대한민국 have different time zones. North Korea is currently GMT+0830 and South Korea is currently GMT+0900.

There are differences between browsers in how well they pick up the OSX timezone. Safari and Google Chrome behave best as whenever I change timezone and run my JavaScript Date() code the new timezone date is displayed. With Firefox, Opera and Yandex one has to either restart or open up a new browser window in order to get the new timezone date.

Environment: OSX High Sierra version 10.13.2