Thursday, 20 May 2021

Millions of Domain Names

There are a number of organisations which maintain ranked lists of popular Domain Names where popular is determined by the criteria of said organisation.

🌏 Cisco Umbrella 1 Million umbrella.cisco.com/blog/cisco-umbrella-1-million

🌏 DomCop Top 10 million domains domcop.com/top-10-million-domains

🌏 The Majestic Million majestic.com/reports/majestic-million

🌏 DN Pedia: Top Million Websites & TLDs dnpedia.com/tlds/topm.php Provides an interactive interface to the Alexa Top 1 Million list.

Friday, 7 May 2021

IDN (Internationalized Domain Names) Statistics

The following sites give information on the number of registrations of IDN TLDs (Internationalised Domain Name Top Level Domains). These IDN TLDs use scripts other than the Latin script. IDN TLDs include, for example, .भारत (India) and .我爱你 (I Love You). For full list of TLDs, please see en.wikipedia.org/wiki/List_of_Internet_top-level_domains

🌏 dotTLD.net dottld.net lists the approximate number of domain name registrations for each TLD (Top Level Domain). Thay have one section dedicated to IDNs dottld.net/tlds-idn.php

🌎 namestat namestat.org. ICANN IDN new gTLDs are listed at namestat.org/s/top-idns

🌎 Domain Name Stats domainnamestats.com. Click the top level light green button to select the category IDNs. It will remember your choice in a cookie.

🌍 Domain Name Stat domainnamestat.com. Can list by TLD type such as generic or country but cannot list IDNs only and IDNs are displayed as punycode and not Unicode.

🌏 nTLDStats ntldstats.com/tld. It has stats on ICANN New gTLDs newgtlds.icann.org/en/program-status/delegated-strings and does not have a separate section explicitly for IDNs only.

🌏 Registries will have information on the IDN TLDs they manage. KRNIC (Korea Network Information Centre) manage .한국 (Korea) and the stats are available on their website at 한국인터넷정보센터.한국/jsp/eng/domain/kr/statistics.jsp

🌏 domain-status daily registrations domain-status.com Provides daily domain name registrations for TLDs .com, .net, .org, .info, .us, .name, .asia and .pro.

Wednesday, 5 May 2021

Jongno Hall Restaurant

Jongno Hall 종로회관 is a restaurant in Jeonju city 전주시, South Korea. It's website has the Korean Domain Name 종로회관.닷컴 where 닷컴 is Verisign's Korean Hangul dot com. Here is Jongno Hall on Kakao Road View 카카오로드뷰 kko.to/ugAQkBLYT.

Majestic have The Majestic Million which is their list of the domain names of the top 1 million websites according to their link based ranking system majestic.com/reports/majestic-million The Majestic Million is, as you were probably expecting, dominated by dot com, eg google.com is at position 1, the highest rank, followed closely by facebook.com at position 2. There are though a small number non ASCII TLDs (Top Level Domains) in the list, such as: 한국 (South Korea), москва (moscow), 网络 (internet) and 닷컴 (dot com).

Much to my (pleasant) surprise, on April 25th 2021, Jongno Hall's domain 종로회관.닷컴 made a grand entrance into The Majestic Million at position 960,703 and was labelled as a "New Entry". Since then it has been rapidly climbing up the rankings and currently (17th August 2021) has ranking position 43,197. I was tracking progress on a (mostly) daily basis which you can see on my twitter at twitter.com/andreschappo/status/1386295644164399105 or 고.한국/jongno.

Instant Page 인스턴트 페이지 instantpage.kr/home/makepage lists Jongno Hall Restaurant website 종로회관.닷컴 as one of their works.

You are now probably expecting a conclusion with an explanation of why Jongno Hall has entered and is rapidly climbing The Majestic Million rankings. Well... sorry to disappoint but I do not have an explanation 🤔 😳 I guess SEO and/or promotion. If you have any insights please do tweet or message me at https://twitter.com/andreschappo

Update 10th July 2021: The improvement in ranking of 종로회관.닷컴 has significantly slowed down so I am now going to stop regularly monitoring this site. I will check back occasionally. If you would like to see it's ranking position for yourself then please visit https://majestic.com/reports/majestic-million?majesticMillionType=2&tld=닷컴&oq=. You will be presented with a list of all domains in "The Majestic Million" with the TLD (Top Level Domain) 닷컴 which currently is just 종로회관.닷컴.

Korean Domain Name 한글 도메인 네임

Tuesday, 16 March 2021

KISA and KRNIC Korean Domain Names

KISA Korea Internet & Security Agency 한국인터넷진흥원 has responsibilities for South Korea's National networking infrastructure, one of these being to function as the KRNIC Korea Network Information Center 국인터넷정보센터.
  1. 한국인터넷진흥원.한국 KISA
  2. 인터넷진흥원.한국 KISA
  3. 한국인터넷정보센터.한국 KRNIC
  4. 인터넷정보센터.한국 KRNIC
  5. 후이즈검색.한국 whois

Here are kakao road views 카카오로드뷰 of KISA's office building in Naju city 나주시 kko.to/VhocloGDp and kko.to/HINKNBJDp.

Korean Domain Name 한글 도메인 네임

Sunday, 14 March 2021

Sempio Foods Korean Domain Names

Sempio Foods 샘표식품 is a South Korean food manufacturer. It's products include soy sauce 간장, gochujang 고추장, tea 차 and noodles 국수. Detailed information about Sempio and itʼs products available in English at en.sempio.com

There are three available Korean Hangul TLDs (Top Level Domains): 한국 Korea, 닷컴 dot com and 닷넷 dot net. I am pleased to be able to report that Sempio 샘표 has domain names with all these Korea TLDs and they all resolve to active websites. Additionally Sempio has an ASCII com TLD domain name.

  1. 샘표.한국
  2. 샘표.닷컴
  3. 샘표.닷넷
  4. 샘표.com

Sempio have transformed the exterior of their Icheon 이천시 factory into a wonderfully creative and colourful work of art. Here is a Naver street view of their Icheon factory naver.me/xw6DgaX6. They also have a Sempio history museum and art exhibition space inside their factory. blog.naver.com/nightsho/70149640713 has an excellent set of photos of both the exterior and interior of the Sempio factory.

Korean Domain Name 한글 도메인 네임

Saturday, 27 February 2021

Jobband Korean Domain Names

Jobband 잡밴드 is a Korean Recruitment Agency. It has an impressively huge set of Korean Domain Names. Each is named for a specific occupation and their website adapts content according to that occupation eg 메디컬잡.닷컴 ➜ 메디컬 medical 잡 job 닷컴 dot com. To put it another way, their site is Domain Name Adaptive.

So far, I have found 60 of their Korean Domain Names, as below. There are, most likely, more of them.

  1. Confectionary & Bakery 제과제빵.닷컴
  2. Private Car Driver 승용기사.닷컴
  3. Sales Position 판매직.닷컴
  4. Interpreter/Translator 통번역.한국
  5. World Trade 트레이드잡.닷컴
  6. Driving 운전직.한국
  7. Heavy Equipment 중장비잡.닷컴
  8. Office 사무직.닷컴
  9. Secretary 경리비서.닷컴
  10. Marketing 마케팅잡.닷컴
  11. IT 아이티잡.닷컴
  12. Design 설계잡.닷컴
  13. Broadcasting Media 미디어잡.닷컴
  14. Law 법률직.닷컴
  15. Tax Accounting 세무직.닷컴
  16. Sales Business 영업직.닷컴
  17. Production 생산직.닷컴
  18. Machine Operator 머신잡.닷컴
  19. Telephone Sales 텔레마케터.닷컴
  20. Estate Agent 부동산잡.닷컴
  21. Hotel Service 호텔잡.닷컴
  22. Motel Service 모텔잡.닷컴
  23. Pension Service 펜션잡.닷컴
  24. Korean Food Cook 한식잡.닷컴
  25. Western Food Cook 양식잡.닷컴
  26. Chinese Food Cook 중식잡.닷컴
  27. Japanese Food Cook 일식잡.닷컴
  28. Medical 메디컬잡.닷컴
  29. Nursing 간호잡.닷컴
  30. Dental Service 덴탈잡.닷컴
  31. Part-Time 알바.닷넷
  32. Overseas 교포잡.닷컴
  33. Sailor Crewman 선원.한국
  34. Nail Art 네일아트.닷컴
  35. Construction 건설.한국
  36. Plumber 배관설비잡.닷컴
  37. Carpenter 목수.한국
  38. Industrial Plant 플랜트.한국
  39. Private Tutor 과외쌤.닷컴
  40. Housekeeper 파출.한국
  41. Security Guard 경비원.닷컴
  42. Cleaning Service 청소잡.닷컴
  43. Wedding Planner 웨딩잡.닷컴
  44. Veterinarian 애니멀잡.닷컴
  45. Golf 골프잡.com
  46. Travel Agency 여행사잡.com
  47. Delivery Service 택배잡.닷컴
  48. Barista 바리스타잡.닷컴
  49. Bartender 바텐더잡.닷컴
  50. Waiter 웨이터.닷컴
  51. Delivery 배달원.닷컴
  52. Chauffeur 대리운전밴드.한국
  53. Courier Service 퀵서비스밴드.한국
  54. Moving Service 이삿짐밴드.한국
  55. Farming 농장.한국
  56. Masseur 마사지밴드.com
  57. Recruiter 헤드헌팅.닷컴
  58. Study Abroad 유학.닷넷
  59. Mobile Telecommunications 이동통신.닷컴
  60. Senior 시니어잡.닷컴

Korean Domain Name 한글 도메인 네임

Sunday, 19 July 2020

Computer Science Internationalization - IDNs

I started a new thread on IDNForums entitled "Interesting IDNs" ➜ idnforums.com/forums/35988-interesting-idns.html

The basic idea is to add human interest stories to bring IDNs to life. So, if you know of or find interesting IDNs then please post the IDN along with itʼs associated story/explanation to aforementioned thread. If you do not want to join IDNforums then tweet me (twitter.com/andreschappo) with your IDN + story and I will post to the thread.

Monday, 14 January 2019

Computer Science Internationalization - Adaptive URL

I recently came across an impressive Serbian website, Serbian National Internet Domain Registry (RNIDS/РНИДС). This siteʼs content text is available by user selection in English or Serbian, Cyrillic script or Serbian, Latin script.

What is happening in the browser address bar is what I find most interesting and impressive. Firstly, RNIDS has a Serbian Cyrillic Domain Name, рнидс.срб. This Domain Name is properly integrated into the site. It does not redirect to an ASCII domain, nor does it use a Frame redirect/forward. It is correctly displayed in the browser address bar.

Secondly, the pathname part of the URL is displayed in the currently selected language/script. If you browse round the site and change language/script you will see the URL pathname instantly adapt to your selected language/script. Here is an example page:

  1. Ћирилица (Serbian/Cyrillic Script): рнидс.срб/национални-домени/регистрација-националних-домена
  2. Latinica (Serbian/Latin Script): рнидс.срб/lat/nacionalni-domeni/registracija-nacionalnih-domena
  3. English: рнидс.срб/en/national-domains/registering-national-domains

I consider RNIDS to be an excellent example of usage of FULLY internationalised URLs. There are, nowadays, many sites with an Internationalised Domain Name in a multitude of languages/scripts. Most though still have an ASCII/English pathname. I consider this to be a missed opportunity. I highly recommend that sites fully internationalise their URLs. One way of achieving this is by the use of aliases schappo.blogspot.com/2017/03/computer-science-internationalization_31.html

Sunday, 10 June 2018

BBC International Websites

If you are in the UK, most of you will be familiar with the BBC website bbc.co.uk. The BBC does have several localised websites for non English languages and regional news. If you are browsing from the UK it is not at all obvious how to visit their localised websites as there are no links on their UK website.

Here is how I found their localised websites. Firstly I visited bbc.co.uk and, as one would expect, I landed on their UK homepage. But, there are no links to their localised sites. If there were I would be happy and would not be writing this article. Next I visited the wikipedia BBC article en.wikipedia.org/wiki/BBC and saw that there is a different address bbc.com. I used this address but as I am browsing from the UK it redirects to bbc.co.uk which is not at all helpful. So, I am back to where I started.

To the rescue comes the Opera browser with itʼs builtin VPN. I set the VPN to a non UK location and now when I use bbc.com I do not get redirected to bbc.co.uk. Scroll down to the bottom of the page and I now see links to their localised websites. If you are browsing from the UK without such a VPN service these links will not redirect to bbc.co.uk

For your convenience I list below all the links to the BBC localised websites.

  1. Arabic عربي bbc.com/arabic
  2. Azeri AZƏRBAYCAN bbc.com/azeri
  3. Bangla বাংলা bbc.com/bengali
  4. Burmese မြန်မာစာ bbc.com/burmese
  5. Chinese 中文 bbc.com/zhongwen/simp
  6. French bbc.com/afrique
  7. Hausa bbc.com/hausa
  8. Hindi हिन्दी bbc.com/hindi
  9. Indonesian bbc.com/indonesia
  10. Japanese 日本語 http://bbc.com/japanese
  11. Kinyarwanda & Kirundi bbc.com/gahuza
  12. Kyrgyz Кыргыз bbc.com/kyrgyz
  13. Marathi मराठी bbc.com/marathi
  14. Nepali नेपाली bbc.com/nepali
  15. Pashto پښتو bbc.com/pashto
  16. Persian فارسی bbc.com/persian
  17. Portuguese bbc.com/portuguese
  18. Russian ру́сский bbc.com/russian
  19. Sinhala සිංහල https://bbc.com/sinhala
  20. Somali bbc.com/somali
  21. Spanish bbc.com/mundo
  22. Swahili bbc.com/swahili
  23. Tamil தமிழ் bbc.com/tamil
  24. Turkish TÜRKÇE bbc.com/turkce
  25. Ukrainian УКРАЇНСЬКA bbc.com/ukrainian
  26. Urdu اردو bbc.com/urdu
  27. Uzbek O'ZBEK bbc.com/uzbek
  28. Vietnamese TIẾNG VIỆT bbc.com/vietnamese

Sunday, 22 April 2018

Computer Science Internationalization - Ideographic Description Characters

I recently created a new Chinese character. This is the very first time I have done so. I created the character to write on a farewell card.

Firstly some background. About two years ago I gave a printout to a colleague, Katherine Hollingsworth, of the Chinese character 好 which means good. I explained how this character has two components, the left part meaning woman and the right part meaning child. Woman 女 + child 子 is something good, hence the meaning good. Katherine, at this time had one child.

Fast forward two years and Katherine is leaving us. As is traditional, there was a farewell card for us to write our best wishes. That is when I had the idea of creating a new Chinese character. Katherine now has two children. The character I created was ⿰好子 which is a woman with two children. I handwrote this character onto the card. Chinese characters are written into a square which is what I did when handwriting the combination of 好 and 子.

Now to the Computer Science part. In Unicode there are twelve Ideographic Description Characters: ⿰ ⿱ ⿲ ⿳ ⿴ ⿵ ⿶ ⿷ ⿸ ⿹ ⿺ ⿻, U+2FF0➔2FFB. These can be used to construct new characters from combinations of existing characters and/or components. They represent the topological relationship between the components. ⿰ is used to represent a character with two components, a left part and a right part. The ideographic description sequence for my new character is thus ⿰好子. I have given this character the name 双好 which means double good😀

I told my Chinese project students about my new Chinese character. One of the students, 王国旭 Wang Guoxu, suggested an additional way of constructing the character using a left part, a middle part and a right part. The sequence for his suggestion is ⿲子女子. I really like this suggestion as we now have a woman surrounded by her two children.

Update: I have now devised a second new Chinese character. It is related to the above character and consists of four, side by side components. The ideographic description sequence is: ⿰⿰子男⿰女子. I will leave it to you, the reader, to determine what it represents.

Update 2: Another arrangement of the 双好 components is to have the woman above the children. The ideographic description sequence is: ⿱女⿰子子. See 고.한국/hao2 or twitter.com/andreschappo/status/1046367105141153793 for calligraphic versions of this arrangement.

Update 3: Katherine now has 3 children and so it is time for a triple good 三好 new Chinese character with 3 children components. Possible arrangements of the components include ⿱⿰女子⿰子子 and ⿱女⿲子子子. Please see 고.한국/hao3 or twitter.com/andreschappo/status/1093833016353406976 for calligraphic versions of these arrangements

There is a useful online utility which can be used to compose and visualise new characters. Here is a new character representing a woman with three children zi.tools/?secondary=ids&seq=⿱女⿲子子子

Saturday, 10 March 2018

Computer Science Internationalization - Validating People Names

When coding, it is essential to consider and account for edge cases. Whilst coding Internationalised Programming Challenge 17 jsfiddle.net/coas/4djhso1y I happened upon an unexpected and fascinating edge case.

Before I reveal the edge case I need to give you some background information, starting with some Chinese characters.

娥,鄂,鹅,仒,厄,戹*,屵*,阨*,阿*,呝,俄,砨*,偔,堨*,圔*,誒*,噁*,儑*,貖*,礘*,櫮,鰪*,岋,阸*,妸,咢,匎*,卾,隘*,廅*,僫,蕚,噩,鍔,額,鰐,讹,吪,妿,咹,胺,啞*,蛯*,搤,磀,遻*,嶭*,騀,顎,鶚 ...and many more at chinese-tools.com/tools/sinograms.html?p=e

All these Chinese characters can be written in pinyin as E or e. Those characters marked with *, have multiple meanings, hence multiple pronunciations, hence multiple ways of writing in pinyin. Those characters not marked with * are only written in pinyin as E or e. Some of you may well be thinking, what of tone marks. Well, unless I explicitly request it, I have never seen a Chinese person write pinyin with tone marks.

Some of these Chinese characters are family names and some would be suitable for given names.

So, now to the edge case. A Chinese name when written in pinyin could be E E or Ee E. I asked on Weibo 微博 whether anyone knew of any Chinese name which when written in pinyin is E E or Ee E. One person responded with the Chinese name 鄂娥 which written in pinyin is E E.

I reason that a person whose only language is English would think E E are initials and not the full name. Actually, before I considered name edge cases I would probably also have thought E E are initials. I have been aware for a long time that some Chinese characters can be written in pinyin as single letters such as e or a but I had not made the connection with people names.

This example illustrates that programmers need to thoroughly research naming conventions in different countries/cultures/languages before writing validation code.

I would like to encompass several international naming conventions in my Challenge 17. So far, I have coded validation rules for Chinese 中文, Khmer ភាសាខ្មែរ, Korean 한국어, Vietnamese Tiếng Việt and a catchall. I welcome contributions of international naming rules which I will code and incorporate into Challenge 17. You can email me, or if you do not know my email me you can tweet me @andreschappo or contact me on Weibo 微博 @schappo

Techie stuff: The regex I use for Chinese name validation is:

XRegExp("^((?![\\p{InKangxi_Radicals}\\p{InCJK_Radicals_Supplement}\\p{InCJK_Symbols_and_Punctuation}])\\p{Han}){2,4}$","u")

My regex is using negative look-ahead, recognisable by the ?! construct. For each character, I am checking that it is a Han* character and is not a radical or symbol or punctuation character. This can be generalised to:

(?!Character_Set_B)Character_Set_A

which reads as: a character must not be in Character_Set_B and must be in Character_Set_A in order to be valid.

The same can be achieved using negative look-behind:

XRegExp("^(\\p{Han}(?<![\\p{InKangxi_Radicals}\\p{InCJK_Radicals_Supplement}\\p{InCJK_Symbols_and_Punctuation}])){2,4}$","u")

This can be generalised as:

Character_Set_A(?<!Character_Set_B)

which reads as: a character must be in Character_Set_A and not in Character_Set_B in order to be valid.

The Chinese for Regular Expression (regex) is 正则表达式.

* A Han character, in this context, is actually a CJK (Chinese or Japanese or Korean) character but that is far too long a story for this blog article.

Update 30th March 2018: In my XRegExp, above, I am using the u flag. This enables Unicode but only the BMP (Basic Multilingual Plane). There is an A flag which enables the whole of Unicode, BMP + Astral characters, but, for quite some time, I could not get this to work. I did eventually find the problem and a solution. The problem is that jsdelivr minification breaks XRegExp. My solution was simply not to use the jsdelivr minified version of XRegExp. I am, therefore, using cdn.jsdelivr.net/npm/xregexp@4.1.1/xregexp-all.js and not cdn.jsdelivr.net/npm/xregexp@4.1.1/xregexp-all.min.js.

In addition to using the A flag I have had to make some minor changes to my regex. My updated regex is:

XRegExp("^((?!\\p{InKangxi_Radicals}|\\p{InCJK_Radicals_Supplement}|\\p{InCJK_Symbols_and_Punctuation})\\p{Han}){2,4}$","A")

The positive outcome for Chinese and Korean Hanja names validation is that these names can now contain characters from the Unicode SIP (Supplementary Ideographic Plane) in addition to characters in all the other Unicode planes.