Crossing the boundary - analogue universe, digital worlds
Crossing the boundary - analogue universe, digital worlds

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Crossing the boundary - analogue universe, digital worlds

4.2.1 Reducing and processing text

You are familiar with the idea of a word processor. Although I grew up long before the era of word processing, it's now difficult for me to imagine how I ever lived without one. Word processors enable us to enter text into the computer, edit and fiddle about with it, store it and then print it out when we are satisfied with the result. That's exactly what's happening as I write this course. But, if the text spends time inside the computer before being returned to print, that must mean it exists there in the form of numbers. It's inside the boundary. How can text be made into numbers?

Let's use the following famous line from Shakespeare as an example:

Rough winds do shake the darling buds of May

(Sonnet 18)

This presents no problem to the human eye. We read it straight off. Actually the process by which we read, recognise, understand, combine and understand textual symbols is complex and not fully understood – but that's another course.

Exercise 8

How do you think this line could be transformed into numbers?

Discussion

You may have been thinking along the following lines. Pick one number to represent each letter – 1 for ‘a’, 2 for ‘b’, …, – and then simply substitute the number for that letter in the line.

I did say earlier that the computer world is a simple world, and transforming text into numbers is as straightforward as that. First, we assign a unique number to each letter in the alphabet. Each letter in the text now becomes a number inside the computer. I'm going to make the following choices:

letter   a   b   c   d   e   f   g   h   i   j   k   l   m   
number979899100101102103104105106107108109
letter   n   o   P   q   r   s   t   u   V   w   X   y   z   
number110111112113114115116117118119120121122

These choices probably seem fairly arbitrary, but let's stick with them for the moment. Now if I simply substitute each letter with the number I've chosen for it, our line for will look like this inside the computer (the breaks to a new row have no significance):

114111117103104119105110100115100111115
1049710710111610410110097114108105110
1039811710011511110210997121

It looks as if the problem of converting text into numbers has been solved.

SAQ 6

Before going on, do you think the above table is a complete representation of the line of poetry?

Answer

Not quite, unfortunately. If I instruct the computer to translate what I've given it back into text, I'll see

roughwindsdoshakethedarlingbudsofmay

I forgot that there are spaces between the words, probably because I didn't even notice them. Moreover, the first letter of the line should be a capital and so should the first letter of the proper name ‘May’.

But a computer doesn't know anything about words or the spaces between them, still less about the months of the year. We need more numbers to solve this problem. Let's allocate a new number, 32, to represent a space. However, the problem of capital letters is more serious. There is no easy way of instructing the machine that V and ‘R’ are different forms of the same letter. Nor could we possibly tell it anything about the first letters of poetic lines. Our only option is to allocate a whole set of new numbers to the upper-case (capital) versions of every letter. Let's set aside 82 to represent a capital ‘R’ and 77 for a capital W. Now, if I use this enhanced way of representing characters as numbers and peer into the memory of the computer, our line of poetry becomes:

82111117103104321191051101001153210011132
11510497107101321161041013210097114108105
110103329811710032115111327797121

This is now a better representation of the text. The example illustrates that unique numbers are needed, not simply for all the upper- and lower-case letters and for spaces, but also for characters that we might not think of straight away. These include mathematical symbols (e.g. > (greater than), < (less than) and ≠ (not equal to)) and accented letters found in foreign words (e.g. é, è, c and ö). This is why computer scientists usually refer to characters, rather than letters, when discussing text. All in all, then, a great many numbers will have to be assigned to representing text.

M150_1

Take your learning further371

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses372.

If you are new to university level study, we offer two introductory routes to our qualifications. Find out Where to take your learning next?373 You could either choose to start with an Access courses374or an open box module, which allows you to count your previous learning towards an Open University qualification.

Not ready for University study then browse over 1000 free courses on OpenLearn375 and sign up to our newsletter376 to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371