4.9 Making it move
To me, there is a wonderful quality of timelessness about Vermeer's picture of the young woman at her harpsichord. It captures a tranquil moment, frozen for eternity. But of course our visual world is not like that at all. It is dynamic, seething with motion. And schoolchildren have known how to create the illusion of movement since time immemorial. Riffling quickly through a little ‘flick book’ under the desk, with each page showing one step in a moving sequence, as in Figure 24, gives the impression of uninterrupted motion and has whiled away many a tedious Latin or maths lesson down the ages.
Many nineteenth century children's toys were based on the ‘flick book’ principle: in 1825, for instance, the ‘thaumatrope’ tricked the eye into seeing movement by means of a rapidly rotating card; in 1834 the ‘zoetrope’ created a more sophisticated effect with photographs attached to a rotating drum.
The invention of the film camera, and of moving pictures themselves, is generally credited to the Lumiere brothers in 1895, but actually several others made similar inventions around the same time. The humble flick book and the Hollywood movie both rely on exactly the same principle. The human eye registers a new item in the visual field almost the instant it appears. However, after it disappears the image of it persists for some moments in the retina and brain. So, with the flick book, as each new picture appears in front of our eyes, a visual remnant of the previous picture remains. And since each new picture relates closely to the one before it (in the jargon, we say it is highly correlated with it), the brain integrates them into an apparent moving sequence.
The same trick is worked on the mind at the cinema. In the following discussion, I will refer to each picture as a frame, and the speed at which the frames pass in front of our eyes as the frame rate. At less than 10 frames a second (fps) the viewer generally sees each frame as a separate image. Between 10 and 16 fps (the sort of speed the flick book moves at) there is an impression of jerky movement. Above 16 fps, the movement seems much smoother. Films are usually shot and displayed at 24 fps; TV pictures are presented at between 25 and 30 fps, depending on what country you live in. High definition TV, currently only available in Japan, uses 60 fps.
Computers use the same principle to display moving visual information. A series of images is taken from the computer's memory, or direct from a storage device such as a CD, and presented on the computer screen in quick succession. Each image is different from, but correlated with, the previous image; the illusion of smooth movement is created by our own eyes and brains. But recall the problem I raised earlier, of the size of digital encoding of images, and get your calculator out ….
Consider a two-hour film to be displayed on a computer at 24 fps. Each frame is 640×380 pixels and a 24-bit RGB colour encoding is being used. How many bytes (don't even bother with bits) will be required to represent the whole film?
Each frame contains 243,200 pixels. At 3 bytes per pixel, we will need 729,600 bytes per frame. The video is two hours, or 7,200 seconds, which at 24 fps will be 172,800 frames. So we will need 729,600×172,800=126,074,880,000 bytes. Over 126GB! (Remember a gigabyte is a thousand million bytes.)
Now we really are in trouble. Even for modern computers, this is a colossal memory demand – one and a half times the size of an average hard disk. Transferring such an enormous amount of digital information over a network would be achingly slow. The highest rate at which data can be moved around the Open University local network is 100 million bits per second, although it is usually much less for an individual user, as we all have to share. 126GB is 1008Gb (the small b stands for a bit). So it would take nearly three hours to send the video to my colleague in the next office. However, most people still use the telephone network to move data, and this is much slower – a maximum of 56Kb per second. Be sure to make a cup of tea before you try to download a film.
For other practical reasons, it is impossible to work with this amount of data. We have to find some way of reducing the amount of storage that moving images, and still images too, need. The vector graphics approach will not work for complex images, so we must look for a way of compressing bitmapped visual information.
Can you think of any strategy for reducing the size of a bitmapped film? It's a difficult question, so don't worry if you can't get too far with it.
One approach relies on a fact I mentioned earlier – that frames are correlated.
Consider a fragment from a Hollywood movie. The camera rests on Clint Eastwood's face. He narrows his eyes and growls, ‘Do you feel lucky … punk?' The fragment takes perhaps two seconds, or 48 frames. But nothing much is actually moving in that time – only his lips and eyes. If we simply encode every frame separately, we'll find that all of them are very similar to one another. We are simply capturing a lot of the same information over and over again. So, if we fully encode the first frame and then just record the differences between it and the next frame, and then differences between that frame and the next, and so on, we will save huge amounts of space.