Tuesday, 20 February 2018

Computer Science Internationalization - Mandarin Chinese Tones

Standard Mandarin Chinese uses four tones marks for pronunciation: ¯ ´ ˇ `. Normally, one only encounters these tone marks with Chinese written in pinyin but there is absolutely no reason why these tone marks cannot be used with Chinese characters 汉字.

Letʼs use the sentence: Nottingham is the home of Robin Hood. In pinyin this would be written: nuò dīng hàn shì luó bīn hàn de gù xiāng. In Mandarin Chinese this would normally be written: 诺丁汉是罗宾汉的故乡.

Here are some more simple Chinese sentences with tone marks. I have made the text a little larger so you can see the tone marks more clearly.

For the four tone marks I am using Unicode Combining Diacritical Marks, specifically: U+0304 ¯ COMBINING MACRON, U+0301 ´ COMBINING ACUTE ACCENT, U+030C ˇ COMBINING CARON, U+0300 ` COMBINING GRAVE ACCENT. These diacritics combine with the immediately preceding character.

Some Chinese characters have different pronunciations (tones) with different meanings. 与 and 为 are two such characters. Now, suppose I am not sure which is the correct tone, which is highly likely as my knowledge of Chinese is only basic. A single base character can have more than one Unicode combining diacritical mark. So, when I am uncertain I can combine all the relevant diacritics and let people knowledgeable in Chinese decide which is the correct tone from the context. Letʼs take sentence 二 and apply multiple tone marks to 与 and 为.

Actually, I can imagine those knowledgeable in Chinese, using my multiple diacritics methodology illustrated in sentence 四, being able to write a sentence having multiple sensible meanings.

So, how to type the tone marks? Here is one method using OSX and the ABC - Extended keyboard. Firstly, type your Chinese text or copy paste some Chinese text. Now switch to the ABC - Extended keyboard. Place your cursor immediately after a Chinese character and then use one of the following key combinations.

  • first tone ¯, use the key combination: alt ⇧ A
  • second tone ´, use the key combination: alt ⇧ E
  • third tone ˇ, use the key combination: alt ⇧ V
  • fourth tone `, use the key combination: alt ⇧ grave

Repeat for each Chinese character in your text, excepting those that do not have a tone mark. 的 when used as a possessive particle does not have a tone mark. If your OSX system is not setup for the ABC - Extended keyboard, go to System Preferences ➜ Keyboard ➜ Input Sources, and click + to add the ABC - Extended keyboard.

Here is a classic tongue twister: māma qímǎ, mǎ màn, māma mà mǎ.

There is much variance in how well or how badly browsers, word processors and text editors display Chinese characters with combining diacritical marks. Over the years, many times, I have found that TextEdit succeeds where other word processors and text editors fail. I have used TextEdit to produce the above five Chinese sentences and included them as images.

With html documents we can use ruby annotation and CSS to combine Chinese characters and tone marks. Here is sentence 三 rewritten using ruby annotation and CSS.

  1. ˍˎˏ ̬ˍˏˎˍˎˏˍˎ

英国,诺丁汉市,秋月茶 ➜ augustmoontea.com

Environment: OSX High Sierra version 10.13.2