Thursday, 29 May 2014

New IDN gTLDs

The New gTLDs initiative has been ongoing for some time now so I thought it time I listed some Website Domain Names. I will only be listing IDNs (Internationalized Domain Names) which are those that consist of non Latin Scripts. My intention is to have at least one list entry for each New IDN gTLD as it goes live so this post will be ongoing for several months.

I will only be listing those IDNs I consider to be reasonably well integrated into the website. My criteria for inclusion excludes Frame redirects and IDNs that redirect to an ASCII Domain Name.
  1. العاب-فلاش.شبكة
  2. 洋服お直し.みんな 海外送金.みんな
  3. 房地产.在线
  4. тобольск.онлайн
  5. цемент.орг праздник.орг
  6. продвинем.сайт
  7. мы.москва

Monday, 14 April 2014

Australian Universities on Weibo

There are quite a number of Australian Universities on Sina Weibo 新浪微博. Below I list those I have found. I only include those Australian Universities that have verified (the big blue V after the username) Weibo accounts. The text in square brackets is the username on Weibo.
  1. Australian Catholic University [@ACUInternational] weibo.com/acuinternational
  2. Charles Darwin University [@查尔斯达尔文大学] weibo.com/charlesdarwinuni
  3. Curtin University [@科廷大学CurtinUniversity] weibo.com/CurtinWestAustralia
  4. Deakin University [@澳大利亚迪肯大学] weibo.com/deakinuniversity
  5. Federation University [@澳大利亚联邦大学FedUni] weibo.com/FedUniAustralia
  6. Flinders University [@FlindersUni弗林德斯大学] weibo.com/flinders2011
  7. La Trobe University [@澳大利亚拉筹伯大学] weibo.com/latrobeuniaus
  8. Macquaire University [@澳大利亚麦考瑞大学] weibo.com/mquni
  9. Monash University [@MonashUni澳大利亚蒙纳士大学] weibo.com/monashuniversityaust
  10. Queensland University of Technology [@QUT昆士兰科技大学] weibo.com/qutbrisbane
  11. Southern Cross University [@澳大利亚南十字星大学] weibo.com/scuchina
  12. Swinburne University of Technology [@澳洲斯威本科技大学] weibo.com/swinburneuniversity
  13. University of Adelaide [@澳大利亚阿德莱德大学] weibo.com/uniadelaide
  14. University of Canberra [@堪培拉大学] weibo.com/unicanberra
  15. University of Melbourne [@墨尔本大学官微] weibo.com/melbourneuni
  16. University of New South Wales [@澳洲新南威尔士大学] weibo.com/ozunsw
  17. University of Queensland [@昆士兰大学] weibo.com/myuq
  18. University of South Australia [@南澳大学官方微博] weibo.com/studyatunisa
  19. University of Southern Queensland [@澳大利亚南昆士兰大学] weibo.com/usqchina
  20. University of Western Sydney [@西悉尼大学UWS] weibo.com/uwsinternational
  21. University of Wollongong [@澳大利亚卧龙岗大学UOW] weibo.com/uowaustralia

Friday, 11 April 2014

Regular Expressions

Regular Expressions are not just about ASCII. They are (or should be) about Unicode, with ASCII being a very small subset of Unicode. The vast majority of Regular Expressions documentation and tutorials I have seen, only deal with ASCII. The consequence is that many/most will never consider non ASCII text strings.

If one considers Unicode text strings then one can process text strings consisting of non Latin Scripts and Symbols. Scripts such as: Cyrillic, Devanagari, Tamil, Georgian, Cherokee, Chinese and Sinhala. Symbols such as: Currency, Arrows, Mathematical Operators, Mahjong Tiles and Playing Cards. Unicode has a repertoire of over 100000 characters which can be processed with Regular Expressions.

Mostly, Regular Expressions are no different when using Unicode as compared to using the very limited ASCII. I will give some simple examples using Hangul, which is the Script used for writing Korean. The Hangul characters I will be using in the examples below are in Unicode block Hangul Syllables U+AC00-D7AF. I will intersperse other Unicode characters in my examples below. I present the examples in the form of a terminal session transcript.
苹果电脑 ~: egrep '바나나'
abcdef
abc바나나def
abc바나나def

苹果电脑 ~: egrep '바.나.나'
바诺丁汉나拉夫堡나
바拉나夫나堡
바拉나夫나堡

苹果电脑 ~: egrep '[바나다]'
abcdef
보노도고로
ДЖԶख나ખ༁
ДЖԶख나ખ༁

苹果电脑 ~: egrep '[가-힣]'
abcdef
abc현def
abc현def

苹果电脑 ~: egrep '^[ 가-힣]+$'
abcdef
abc서울def
서울은 아름답다
서울은 아름답다
Where you see a line duplicated that means there was a successful match with the Regular Expression. I have used egrep on OSX.

The transcript may look a bit odd because of the variety and unfamiliarity of the Unicode characters I have used. If, though, you carefully examine the above Regular Expressions you will see they have standard syntax and are actually elementary constructs. So, if you teach Regular Expressions, why not give your students an insight into processing Unicode strings and not just ASCII strings. Or, to put it another way, give your students an insight into processing multi-language strings and not just English strings. Or, to put it yet another way, code for the whole world and not just the English speaking world.

BTW — 苹果电脑 ~: is the prompt I setup for my iMac and the first four characters are Chinese for Apple Computer.

In the examples above, I have deliberately used one of the standard and common Regular Expression engines. I have accessed this engine via egrep. This type of engine is one which you will most likely encounter. Much less common, are the Regular Expression engines that have been extended with features specifically for Unicode. Such extensions, for instance, facilitate matching with Unicode characters having some specified property e.g. \p{Hangul} will match with any character belonging to the Hangul Script. More information on such engines is available at regular-expressions.info/unicode.html and unicode.org/reports/tr18/

Thursday, 16 January 2014

Japanese Domain Name

I believe はじめよう.みんな to be the world's first live fully Japanese Domain Name! It is written with the Japanese Hiragana script. みんな is one of Google's new gTLDs icannwiki.com/index.php/.みんな.

Google translates はじめよう to "Let's start with" and みんな to "Everyone" translate.google.co.uk/#ja/en/はじめよう%0Aみんな

One can use the Ideographic Full Stop rather than the ASCII Full Stop as the separator in Internationalized Domain Names ie はじめよう。みんな. This then gives us the rather cool translation to English "Let's start with. Everyone" translate.google.co.uk/#ja/en/はじめよう。みんな

Monday, 13 January 2014

Apple Color Emoji

In a previous post schappo.blogspot.co.uk/2014/01/localized-font-names.html I examined how well Browsers deal with language localized font names. My exploration, this time, concerns the number of localized names a font has. I decided upon a font that stands out from the crowd. A font that many will be aware of, namely, Apple Color Emoji. OSX Mavericks 10.9.1 has 33 system language localizations and I found that the Apple Color Emoji font has 16 unique language localizations, as in the table below.

English
Apple Color Emoji
Arabic
لون
Chinese (Simplified)
Apple 彩色表情符号
Chinese (Traditional)
Apple 彩色表情符號
Danish
Apple farve-emoji
Dutch
Apple Kleur-Emoji
Finnish
Applen väri-emoji
French
Apple Emoji couleur
German
Apple Farben-Emoji
Italian
Colore Emoji Apple
Japanese
Apple カラー絵文字
Korean
Apple 컬러 이모티콘
Norwegian
Apple farge-emoji
Portuguese
Apple Emoji em Cores
Russian
Цветные эмодзи Apple
Swedish
Apple färg-emoji

Sunday, 12 January 2014

Localized Font Names

Fonts can have more than one name ie names localized to more then one language. In this post I am going to examine the use of fonts that have two language localizations for their names, English and Chinese. The specific fonts I am using are some Chinese fonts newly included in OSX Mavericks:

English Name Chinese Name
HanziPen SC 翩翩体-简
Wawati SC 娃娃体-简
Xingkai SC 行楷-简
Yuppy SC 雅痞-简

My aim is to determine whether or not Browsers can select and use fonts by their, in this case, Chinese names. Here is the relevant html code:

<p style="font-family:'HanziPen SC'">
1a 拉夫堡,莱斯特,伦敦。</p>
<p style="font-family:'翩翩体-简'">
1b 拉夫堡,莱斯特,伦敦。</p><hr />
<p style="font-family:'Wawati SC'">
2a 拉夫堡,莱斯特,伦敦。</p>
<p style="font-family:'娃娃体-简'">
2b 拉夫堡,莱斯特,伦敦。</p><hr />
<p style="font-family:'Xingkai SC'">
3a 拉夫堡,莱斯特,伦敦。</p>
<p style="font-family:'行楷-简'">
3b 拉夫堡,莱斯特,伦敦。</p><hr />
<p style="font-family:'Yuppy SC'">
4a 拉夫堡,莱斯特,伦敦。</p>
<p style="font-family:'雅痞-简'">
4b 拉夫堡,莱斯特,伦敦。</p>

The text is in pairs labelled a and b. A Browser that recognises both the English and Chinese names for a font will render a and b text identically as it will be using the same font. A Browser that does not recognise the Chinese name will use a substitute font and hence a and b text will appear differently. It is expected that a Browser will always recognise an English font name but, as will demonstrated, Chinese font names are often not recognised. Figure 1 shows correct Browser behaviour and Figure 2 shows incorrect browser behaviour.

Figure 1: Correct Browser Behaviour, strings a and b rendered with same font


Figure 2: Incorrect Browser Behaviour, strings a and b rendered with different fonts

I tested with Chrome (31.0.1650.63), Firefox (v26.0) and Safari (v7.0.1). Test OS was OSX Mavericks 10.9.1. Only Firefox worked correctly!!! My OSX localization for the tests was English. I did switch my OSX to Chinese and repeated my tests but the results were the same.

Here is what W3C have to say in CSS Fonts Module Level 3 w3.org/TR/css3-fonts/#font-family-prop: "Some font formats allow fonts to carry multiple localizations of the family name. User agents must recognize and correctly match all of these names independent of the underlying platform localization, system API used or document encoding"

So, in due course, all Browsers should work with all Localized Font Names. As evidenced by my tests, that is not yet the case. So, what do we do in the meantime. I am with Kendra Schaefer's (www.kendraschaefer.com/2012/06/chinese-standard-web-fonts-the-ultimate-guide-to-css-font-family-declarations-for-web-design-in-simplified-chinese/) recommendations, which is to include all the Localized Font names in the font-family declaration eg

font-family: "Yuppy SC", "雅痞-简", sans-serif;

Wednesday, 2 January 2013

Unicode and Hangul

One of the Unicode blocks is Hangul Syllables, codepoints U+AC00➜U+D7AF. Each character in this block has a formal Unicode name written in upper case Latin eg

  • U+AC85 겅 HANGUL SYLLABLE GEONG
  • U+B268 뉨 HANGUL SYLLABLE NWIM
I draw your attention to the last word in the Unicode name. This may appear to be a string of random  Latin characters. If though you use the Mac OSX GongjinCheong Romaja Input Method this string represents the sequence of key presses required to produce the Hangul Syllable. So, taking the first example above, typing the key sequence GEONG will produce the Hangul character 겅.

There are two cases where one needs to augment the key sequence in order to write the Hangul character.

  1. When the syllable begins with a vowel then one needs to prefix with the silent placeholder ㅇwhich in the GongjinCheong Romaja Input Method is produced by typing X. Thus, U+C54B 앋 HANGUL SYLLABLE AD, is produced by typing the key sequence XAD
  2. When the syllable ends with ㄲ or ㅆ then one needs to type ⇧G or ⇧S, respectively. Thus, U+AC14 갔 HANGUL SYLLABLE GASS, is produced by typing the key sequence GA⇧S
Notes:
  • Unicode characters can be viewed on Mac OSX using Character Viewer
  • The GongjinCheong Romaja Input Method is enabled in System Preferences➞Language & Text➞Input Sources