Representations must be agreed if they are to be shared. If different computers used different numbers to encode the same character, people would not be able to read each other's documents. There have to be standards. There are countless computer standards, covering every aspect of information technology, from music and picture encoding to programming language design. And, as you would expect, there are standards which apply to character encoding. You may have wondered why I chose such apparently random numbers to stand for the characters I needed. I didn't. I simply chose numbers that have already been agreed in the Unicode standard for character representation. Unicode is a development of an earlier standard, ASCII (American Standard Code for Information Interchange) which was approved in 1967.
ASCII set aside 128 numbers, from 0 to 127, for upper and lower-case alphabetic characters, punctuation marks and some ‘invisible’ characters, such as a carriage return (start a new line) and a tab.
Unicode, work on which began in 1987, preserves the ASCII numbers, but hugely expands the set of numbers available to 65,536. These are intended to be used roughly as follows:
8192 numbers for representing characters in the world's main languages, including Hebrew and Sanskrit;
4096 for punctuation marks, graphics and special symbols;
5632 for developers to define their own symbols;
27,000 or so for Han Chinese characters;
the remainder for characters yet to be invented.
Why do you think ASCII supported exactly 128 numbers and Unicode exactly 65,536?
As you might have guessed, the answer lies in the binary. The binary numbers 0000000 to 1111111 (0–127) can be stored in 7 bits. The numbers 0–65,535 can be stored in 16 bits (two bytes).