Monday, 13 September 2021

Japanese Dog Food Domain Name

This morning I discovered an excellent Japanese dog food domain name ➜ プレミアムドッグフード.コム This translates as "premium dog food".com. It resolves to a well developed and complete website which provides comprehensive information on dog food.

The winning bonus feature of this site is that, as well as the domain name, the pathname part of the URLs is written in Japanese which is why I am writing about it in my blog. Many sites have an excellent non Latin Script domain name but, unfortunately and disappointingly, use ASCII for the pathname part of the URL.

Here are some of the fully Japanese URLs for プレミアムドッグフード.コム.

  1. プレミアムドッグフード.コム/モグワンドッグフード
  2. プレミアムドッグフード.コム/カナガンドッグフード
  3. プレミアムドッグフード.コム/ネルソンズドッグフード
  4. プレミアムドッグフード.コム/カテゴリー/ドッグフードの選び方

So, you have clicked one of the links in the above list and in the address bar of your browser you see the totally cool, fully Japanese URL. Suppose you now want to paste that URL into your word processor. You will most likely expect that what you see in the address bar of your browser is what you will get after the paste into your word processor. Well, sometimes you will and sometimes not. It will be dependent on the browser you are using.

As is common practice in computing and for good reasons, what is presented to users can be different to what is actually used under the bonnet. URLs may be ASCII encoded which for the domain name part will be punycode encoding en.wikipedia.org/wiki/Punycode and for the pathname part will be percent encoding en.wikipedia.org/wiki/Percent-encoding.

I tested copy/pasting of the browser address bar content to TextEdit with my current cocktail mix of browsers and versions. Safari, Opera and Yandex Яндекс gave me the fully Japanese URL text, which is what I want and as it should be.

Firefox gave me:
http://プレミアムドッグフード.コム/%E3%83%A2%E3%82%B0%E3%83%AF%E3%83%B3%E3%83%89
%E3%83%83%E3%82%B0%E3%83%95%E3%83%BC%E3%83%89/

Chrome and Naver Whale 네이버 웨일 gave me:
http://xn--cck2a6cxac5ej8dk1j3h.xn--tckwe/%E3%83%A2%E3%82%B0%E3%83%AF%E3%83%B3%E3
%83%89%E3%83%83%E3%82%B0%E3%83%95%E3%83%BC%E3%83%89/

These two ASCII encoded forms should never be presented to the user. This does not happen just with Japanese URLs. It happens with Unicode URLs. The Japanese characters are a subset of Unicode.

Japanese Domain Name 日本語ドメイン名