Wednesday 26 October 2016

Family of Korean IDNs

The following is a list of functioning Korean IDNs (Internationalized Domain Names). The TLD (Top Level Domain) used is 닷컴 which is Verisign's Korean language equivalent to their com TLD.

This family of Korean IDNs are concerned with Computer Repair 컴퓨터수리. The first two characters of the first 6 IDNs are names of Korean cities: 김포 Gimpo, 안양 Anyang, 용인 Yongin, 대구 Daegu, 파주 Paju, 성남 Seongnam. I think the first 2 characters of the next 2 IDNs are districts or neighbourhoods of 서울 Seoul: 용산 Yongsan, 종로 Jongno. The last one 일산 Ilsan, is a 동 neighbourhood of 고양 Goyang.

  1. 김포컴퓨터수리.닷컴
  2. 안양컴퓨터수리.닷컴
  3. 용인컴퓨터수리.닷컴
  4. 대구컴퓨터수리.닷컴
  5. 파주컴퓨터수리.닷컴
  6. 성남컴퓨터수리.닷컴
  7. 용산컴퓨터수리.닷컴
  8. 종로컴퓨터수리.닷컴
  9. 일산컴퓨터수리.닷컴

The following family of Korean IDNs all resolve to Fun Design website. I discovered this family at newly.domains/20171128-229 홈페이지제작 means home page creation. The first 2 Korean characters of the first 9 IDNs are Korean cities: 과천 Gwacheon, 광주 Gwangju, 대구 Daegu, 대전 Daejeon, 부산 Busan, 서울 Seoul, 수원 Suwon, 울산 Ulsan and 인천 Incheon. The last one, 분당 Bundang, I am less certain about. I think it is a district of 성남 Seongnam.

  1. 과천홈페이지제작.닷컴
  2. 광주홈페이지제작.닷컴
  3. 대구홈페이지제작.닷컴
  4. 대전홈페이지제작.닷컴
  5. 부산홈페이지제작.닷컴
  6. 서울홈페이지제작.닷컴
  7. 수원홈페이지제작.닷컴
  8. 울산홈페이지제작.닷컴
  9. 인천홈페이지제작.닷컴
  10. 분당홈페이지제작.닷컴

Friday 7 October 2016

Computer Science Internationalization — Bidi

Scripts such as Latin are written from Left to Right (L➡︎R). Scripts such as Arabic and Hebrew are written Right to Left (L⬅︎R). What happens when we mix L➡︎R and L⬅︎R scripts within a document? Here is an exercise in mixing scripts.

Take a mixed bidi (bidirectional) string consisting of Latin and Hebrew characters in a L➡︎R paragraph.

abcאבגdef

...and here is the same string in a L⬅︎R paragraph.

abcאבגdef

Now to the actual exercise. Copy the above stings to your text editor or word processor. You will need to setup the 2nd occurrence of the string as a L⬅︎R paragraph. I am assuming that your directionality is L➡︎R by default. Each string has two boundaries where the text changes direction. For each boundary you are going to insert a character, either a L➡︎R, such as x, or a L⬅︎R, such as ד. For each insertion operation use the initial mixed bidi string. There are two mixed strings above and so there are a total of 8 insertion operations. The challenge is to predict where in the strings the inserted character will appear before you actually insert the character. Give it a go! Good luck😀

If I did this exercise before I ever studied bidi, I would probably have scored 4/8. Now I understand how the computer is processing this bidi text and so I usually score full marks for such exercises. It is though not an intuitive process for me as I have spent most of my life reading and writing L➡︎R scripts only. I have to think very carefully as to how the computer does it in order to determine the correct answers.

The main purpose of this exercise is to think about the ordering of the characters in the strings. There are two orderings to consider: memory order and display order. Memory order is how it is logically saved in memory which in this case is the order in which I typed it. The memory order of the string I have used above is "abcגבאdef". Display order is how it is presented to the viewer. You have already seen, above, the two possible display orders for the single string "abcגבאdef".

I have used TextEdit for this exercise. In order to set paragraph text direction in TextEdit follow the path: "TextEdit➜ Format➜ Text➜ Writing Direction". Now set paragraph text direction to Right to Left. TextEdit correctly handles bidi text but that is not the case for all word processors or text editors.

There are several permutations of this exercise, including:

  1. What happens at the boundaries with forward delete and back delete?
  2. What happens if the initial memory order character(s) are L⬅︎R instead of L➡︎R?
  3. Use Arabic instead of Hebrew as this introduces the additional challenge of letters changing shape according to preceding and following characters.

This article is aimed at L➡︎R reading/writing people. If you are a L⬅︎R person then you will need to invert some of my instructions. Actually, if you are a L⬅︎R person you will be totally familiar with mixing bidi text and so will fully understand this exercise.

Environment: OSX v10.12 (Sierra), TextEdit v1.12