Saturday, 20 October 2012

Twitter Character Count

【Update: Do not know when it happened but Twitter no longer differentiates between BMP and non BMP characters WRT character count. All characters now have a count of 1. I may, at some stage, delete this article but, for the time being, I will leave it here as an historical record of the evolution of Twitter.

In a previous article I examined Sina Wēibó 新浪微博 character count for a user post Lets now examine twitter. The stated and generally understood limit is 140 characters for a tweet. This is not strictly true. The actual tweet limit is variable and ranges from 70 to 140, inclusive. Different characters have different counts, as follows:

  • Characters from Unicode range U+0000➜U+FFFF have a count of 1
  • Characters from Unicode range ≥ U+010000 have a count of 2
Or, to put it another way — Characters in the Basic Multilingual Plane (BMP) have a count of 1 and characters in the other planes have a count of 2. The 2 Mahjong Tile characters used in the example below are from the Supplementary Multilingual Plane (SMP).

Lets illustrate with a made-up posting that contains characters from the 2 Unicode ranges, above. The following text has a tweet character count of 17.
  • one two 一二三四五
  • 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 2 + 1 + 1 + 1 + 1 + 1 + 2 = 17