Crossing the boundary - analogue universe, digital worlds
Crossing the boundary - analogue universe, digital worlds

This free course is available to start right now. Review the full course description and key learning outcomes and create an account and enrol if you want a free statement of participation.

Free course

Crossing the boundary - analogue universe, digital worlds

4.2.1 Reducing and processing text

You are familiar with the idea of a word processor. Although I grew up long before the era of word processing, it's now difficult for me to imagine how I ever lived without one. Word processors enable us to enter text into the computer, edit and fiddle about with it, store it and then print it out when we are satisfied with the result. That's exactly what's happening as I write this course. But, if the text spends time inside the computer before being returned to print, that must mean it exists there in the form of numbers. It's inside the boundary. How can text be made into numbers?

Let's use the following famous line from Shakespeare as an example:

Rough winds do shake the darling buds of May

(Sonnet 18)

This presents no problem to the human eye. We read it straight off. Actually the process by which we read, recognise, understand, combine and understand textual symbols is complex and not fully understood – but that's another course.

Exercise 8

How do you think this line could be transformed into numbers?

Discussion

You may have been thinking along the following lines. Pick one number to represent each letter – 1 for ‘a’, 2 for ‘b’, …, – and then simply substitute the number for that letter in the line.

I did say earlier that the computer world is a simple world, and transforming text into numbers is as straightforward as that. First, we assign a unique number to each letter in the alphabet. Each letter in the text now becomes a number inside the computer. I'm going to make the following choices:

letter    a    b    c    d    e    f    g    h    i    j    k    l    m   
number 97 98 99 100 101 102 103 104 105 106 107 108 109
letter    n    o    P    q    r    s    t    u    V    w    X    y    z   
number 110 111 112 113 114 115 116 117 118 119 120 121 122

These choices probably seem fairly arbitrary, but let's stick with them for the moment. Now if I simply substitute each letter with the number I've chosen for it, our line for will look like this inside the computer (the breaks to a new row have no significance):

114 111 117 103 104 119 105 110 100 115 100 111 115
104 97 107 101 116 104 101 100 97 114 108 105 110
103 98 117 100 115 111 102 109 97 121

It looks as if the problem of converting text into numbers has been solved.

SAQ 6

Before going on, do you think the above table is a complete representation of the line of poetry?

Answer

Not quite, unfortunately. If I instruct the computer to translate what I've given it back into text, I'll see

roughwindsdoshakethedarlingbudsofmay

I forgot that there are spaces between the words, probably because I didn't even notice them. Moreover, the first letter of the line should be a capital and so should the first letter of the proper name ‘May’.

But a computer doesn't know anything about words or the spaces between them, still less about the months of the year. We need more numbers to solve this problem. Let's allocate a new number, 32, to represent a space. However, the problem of capital letters is more serious. There is no easy way of instructing the machine that V and ‘R’ are different forms of the same letter. Nor could we possibly tell it anything about the first letters of poetic lines. Our only option is to allocate a whole set of new numbers to the upper-case (capital) versions of every letter. Let's set aside 82 to represent a capital ‘R’ and 77 for a capital W. Now, if I use this enhanced way of representing characters as numbers and peer into the memory of the computer, our line of poetry becomes:

82 111 117 103 104 32 119 105 110 100 115 32 100 111 32
115 104 97 107 101 32 116 104 101 32 100 97 114 108 105
110 103 32 98 117 100 32 115 111 32 77 97 121

This is now a better representation of the text. The example illustrates that unique numbers are needed, not simply for all the upper- and lower-case letters and for spaces, but also for characters that we might not think of straight away. These include mathematical symbols (e.g. > (greater than), < (less than) and ≠ (not equal to)) and accented letters found in foreign words (e.g. é, è, c and ö). This is why computer scientists usually refer to characters, rather than letters, when discussing text. All in all, then, a great many numbers will have to be assigned to representing text.

M150_1

Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to university level study, find out more about the types of qualifications we offer, including our entry level Access courses and Certificates.

Not ready for University study then browse over 900 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus