5.3 Formatting a hard disk
“If there are lots of different operating systems, each with its own kind of file system, do you need a different kind of disk drive for each?” asked Rupert. “Good question,” said Gloria, “but the answer is ‘no’. This is because we can prepare almost any hard disk drive to work with any operating system and its file system by going through a process, called formatting it, before we try to save any data to it.
“The most important thing that happens when a disk is formatted is that at least one area of the disk must be loaded with the operating system’s file system in readiness for it to store data. These areas are called partitions. You can have a single partition that contains all the storage space on the drive or divide the space into several different partitions. If you want to run more than one operating system on your machine, you can even create partitions that have different file systems. Anyway, you need at least one partition on the drive and, if you have more than one partition, the formatting process will cause them to be displayed as separate drives by your operating system – for example, in Windows Explorer or Finder on a Mac.
“Some formatting procedures may also check the physical structure of the disk for errors, recording their location so that data is not written to these locations.
“Once a disk has been formatted, you can write data to it.”
“I had a … ‘friend’ once who formatted his disk by accident one day and he lost all of his data,” said Rupert, looking sad.
“Well, if you reformat a disk that has already got data on it, the data does not disappear, but it can’t be retrieved by the operating system in the usual way. So it usually needs special tools to get it back.”
“Ah, it’s a bit like you have destroyed the indexing system in a large library,” said Rupert. “All the books are still there, but they are really difficult to find.”
“So should you ever reformat a hard drive?” asked Rupert.
“Well, sometimes you might want to go back to scratch to install a new operating system on the disk or on a partition of the disk. In that case, you would want to reformat the disk or partition for the new operating system. As a last resort, reformatting a disk can also be a way of removing viruses from the computer, or fixing other errors.”
“So how is the space organised on the hard drive, once it is formatted?” asked Rupert.
“All hard disks are formed of a series of tracks – sometimes called rings – that can contain data. A disk track is too large to manage the data effectively as a single storage unit. (An individual disk track can store more than a megabyte of data, which would be very inefficient for storing small files.) So, as part of the formatting process, tracks are divided into several numbered, equal divisions known as sectors.
“The sectors are arc-shaped pieces of a track. Almost all file systems create sectors that can hold 512 bytes of data. The sectors are grouped together in clusters. So a cluster is a larger unit of memory whose size depends on the particular file system being used. A cluster always consists of one or more consecutive sectors, but typically there are four or eight (or some other power of two) sectors in a cluster. When a file is written to the hard disk, it always takes up a whole number of clusters.”
“I might be getting lost now,” said Rupert, plaintively.
“No worries. There is a lot of terminology to take in, I know. Here is a summary.
- Each platter of a hard disk is divided into a number of concentric tracks.
- Each track is divided into a number of sectors, each of which can store the same amount of data. A sector is the smallest physical storage unit on the disk, and on most file systems it is fixed at 512 bytes in size.
- A cluster can consist of one or more consecutive sectors – commonly, a cluster will have four or eight sectors. As a file is written to the disk, the file system allocates the appropriate whole number of clusters to store the file’s data.”
“Confusingly,” continued Gloria, “on diagrams, the sectors on a platter are usually shown as being arranged in segments, like pizza slices. This was the case on older hard disk drives, but it meant that the sectors on the outside of the disk had a larger area than those closer to the centre, which meant that they held fewer bits per unit area, and were less efficient at storing data than the inner sectors. On modern disks, each sector has the same area, so they each store the same number of bits per unit area” (Figure 17).
Activity 12 (self-assessment) Storing data in bytes, sectors and clusters
Given that a sector is 512 bytes in size, how many bytes of storage are there in a cluster composed of four sectors?
There are 4 × 512 = 2048 bytes in a cluster.
“So, once a file has been written to one or more clusters, how does the operating system know where to find the file again?” asked Rupert.
“Well, let’s talk fat” was the surprising answer.
“What do you mean, ‘fat’!” exclaimed Rupert, sounding surprised at this turn in the conversation.
“No, you idiot,” said Gloria affectionately, “not fat, but FAT, which stands for File Allocation Table. It is the area of the hard disk that is used as an index of every cluster on the disk and records whether a cluster is being used or not. It is what is at the heart of the file system called FAT32, which was used by Windows operating systems but is now mainly used with solid-state memory, such as flash. It can only cope with a maximum file size of 4 GB (gigabytes).
“Windows computers now mainly use a file system called New Technology File System (NTFS), where a table called a Master File Table (MFT) does a similar job. Apple has a file system that is unimaginatively called The Apple File System, and the Linux file system is called ext4. They each have similar tables.”
“I suppose that when a new file is to be saved, the operating system can also use this table to determine which clusters are in use and which are free to be allocated?” said Rupert.
“Exactly,” replied Gloria. “The space that is available for files to be written to is referred to as unallocated space on the disk and, of course, this is always a whole number of clusters’ worth of bytes.”